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CHAPTER  I 
INTRODUCTION 

This  study  deals  with  the  problem  of  determining  whether  an 
observed  object  is  or  is  not  a member  of  a listed  class  of  objects. 
The  technique  developed  classifies  the  observed  object  as  listed 
or  unlisted  based  on  a priori  information  regarding  the  listed 
class  alone.  The  technique  assures  that  the  misclassif ication 
probability  of  a listed  object  as  an  unlisted  one  is  kept  under 
a specified  threshold  while  attempting  to  minimize  the  reverse, 
"type  II"  probability  of  error. 

In  most  identification  processes  the  signatures  of  an  object, 
e.g.,  light,  sound  or  temperature  etc.,  are  assumed  to  be  measur- 
able and  can  be  converted  into  electric  signals,  which  are  called 
the  responses  of  the  object.  The  basic  assumption  in  statistical 
classification  is  that  the  response  of  each  object  lias  a probabil- 
ity distribution.  Some  algori thins [ 1 ,2,3]  were  developed  to  iden- 
tify each  object  by  using  the  available  information,  which  ranges 
from  complete  statistical  knowledge  of  the  distribution  to  no 
knowledge  except  that  which  can  be  deduced  from  the  measured 
response. 

More  specifically,  when  the  statistics  of  all  the  listed 


objects  are  known,  the  response  of  an  observed  object  is  compared 
with  the  responses  of  these  known  objects.  The  object  is  then 


2 


classified  as  one  of  the  known  objects  by  a predetermined  criterion. 
[fit1  set  of  known  objects  is  catalogued  beforehand  and  is  called 
the  catalogued  (or  listed)  class.  Conventional  classification 
procedures  assume  that  the  object  to  be  identified  is  in  the  list 
ot  the  known  objects.  However,  since  the  list  of  the  atalogued 
objects  is  very  often  not  exhaustive,  this  assumption  is  not 
always  valid. 

Another  often  encountered  problem  is  that  the  functional 
forms  of  the  distributions  of  the  listed  objects  are  known,  but 
some  parameters  of  these  distributions  are  unknown.  A great  deal 
of  ef fort[4 , 5]  has  been  devoted  to  solve  these  kinds  of  problems, 
namely,  classification  of  the  objects  without  knowing  the  para- 
meters of  the  distributions.  Two  procedures  are  usually  employed 
in  solving  this  problem: 

(1)  Estimation  of  the  parameters  and  their  use  to  do  the 
classification  as  ordinary  parametric  classification, 
and 

(2)  Classification  of  the  object  directly  without  knowing 
the  parameters  of  all  distributions. 

The  first  procedure  turns  out  to  be  a learning  process, 
supervised  or  unsupervised  depending  on  the  labeling  of  the  ob- 
served samples.  Both  learning  processes  are  themselves  another 
form  of  the  statistical  estimation.  The  supervised  process 
labels  each  sample  "to  be  learned"  as  an  object  in  the  catalogued 
class  and  the  unsupervised  process  assigns  a priori  probabilities 
for  all  the  objects  in  the  catalogued  class  although  it  does  not 


label  any  of  the  samples.  Both  processes  possess  an  underlying 
assumption  that  all  the  learned  samples  originate  from  the  list- 
ed class,  which  is,  as  stated  before,  not  always  complete  and 
exhaust i ve. 

The  second  procedure  is  a classification  which  does  not  use 
the  parameters  of  the  distributions  and  is  called  non-parametric 
classification.  It  does  not  require  any  information  about  the 
distribution  of  each  object  in  the  catalogued  class  but  the 
number  of  the  objects  and  their  deterministic  responses  have  to 
be  known  before  an  algorithm  can  be  devised  to  do  the  classifi- 
cation. 

The  problem  studied  in  this  work  is  different  from  the 
ordinary  parametric  classification  problem  in  that  it  includes 
an  additional  alternative,  the  presence  of  an  uncatalogued  class. 
It  is  also  different  from  a non-parametric  proulem  in  an  obvious 
way.  The  alternatives  in  a non-parametric  problem  are  clearly 
defined  while  they  are  incompletely  defined  in  this  problem. 

The  parameters  of  the  distributions  are  not  known  in  a non- 
parametric  problem  but  are  not  necessarily  unknown  in  our  problem 

The  fact  that  all  the  classification  schemes  ignore  the 
existence  of  the  unlisted  objects  is  due  to  the  difficulty  in 
handling  the  problem,  not  because  of  its  insignificance.  Since 
the  unlisted  objects  are  usually  unknown  to  a system  designer,  it 
is  very  difficult  to  predict  the  performance  of  a system  classify 
ing  the  unlisted  class  as  distinct  from  the  listed  class.  Also 
since  the  statistics  of  the  unlisted  class  are  unknown,  there  is 
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no  way  that  an  "optimum"  criterion  can  be  set  up  to  guide  the 
design  of  such  a system.  However,  for  some  classifiers  it  is 
important  to  identify  an  unexpected  or  unknown  object.  For  in- 
stance, the  radar  detectors  in  air  surveillance  are  developed  to 
detect  airborne  objects.  There  are  many  schemes  developed  to 
identify  an  object  after  a radar  detects  some  response  from  a 
target.  A conventional  aircraft  classifier  detects  the  electro- 
magnetic scattering  return  from  an  airplane  and  compares  it  with 
tin-  list  of  the  responses  of  all  the  known  aircraft  targets  and 
attempts  to  classify  an  observed  target  into  one  of  them[6,7]. 

Yet,  tiie  stored  data  of  the  responses  are  by  no  means  complete 
and  exhaustive.  When  a target  is  present  and  its  response  looks 
"odd"  on  tiie  radar,  it  is  extremely  important  not  to  exclude  the 
possibility  of  its  being  an  unknown  airborne  target,  perhaps 
newly  developed.  Therefore,  it  is  essential  to  determine  at  the 
very  beginning  stage  of  classification  if  the  observed  target  is 
in  the  list.  Once  it  is  decided  that  the  target  is  in  the  listed 
class,  one  can  use  a conventional  scheme  to  classify  it  as  one 
of  the  known  targets.  If  it  is  not,  the  target  is  designated  as 
a new  one  and  a learning  process  is  employed  to  estimate  its 

i 

response. 

Another  example  is  the  Electrocardiogram  (EKG)  used  to 
diagnose  a disease.  A person  can  be  classified  into  nine  differ- 
ent states[8]  which  include  the  normal  state.  However,  a peculiar 
disease  not  described  by  any  of  these  nine  states  may  never  be 


recognized.  It  would  be  desirable  to  develop  a scheme  to  identify 
unexpected  or  unknown  diseases. 

It  is  therefore  the  purpose  of  this  work  to  develop  a method 
to  distinguish  whether  an  object  is  in  the  listed  or  unlisted 
class  before  one  can  use  any  conventional  method  to  do  the  cl  ass i- 
fication.  This  is  referred  to  as  a problem  of  identifying  uncata- 
logued objects  as  distinct  from  catalogued  objects. 

Our  main  effort  will  be  devoted  to  solve  the  above  problem. 
The  method  devised  is  based  on  the  idea  that  an  optimum  classifier 
is  the  one  which  minimizes  the  probability  of  classification 
error.  The  developed  approach  tends  to  minimize  the  error  proba- 
bility of  identifying  an  uncatalogued  object  into  the  catalogued 
one  while  fixing  the  probability  of  the  other  kind  of  classifica- 
tion error,  viz.,  identifying  a catalogued  object  as  an  uncata- 
logued one.  The  reasoning  for  this  approach  is  described  in 
Chapter  II,  and  a criterion  proposed  to  guide  the  design  of  a 
classifier  is  described.  Chapter  III  discusses  some  of  the 
special  properties  of  this  classifier  which  is  applied  to  several 
examples.  The  scheme  is  shown  to  be  effective  in  distinguishing 
the  listed  objects  from  unlisted  ones.  The  algorithm  is  simple 
in  terms  of  implementation  and  computation  time.  In  Chapter  IV, 
the  scheme  is  applied  to  an  aircraft  identification  problem  and 
the  results  are  presented. 


CHAPTER  II 

MODEL  DISCUSSION  AND  THE  PROPOSED  CRITERION 


In  this  chapter,  the  basic  philosophy  of  the  algorithm  is 
discussed.  Due  to  its  similarity  to  the  Neyman-Pearson  rule,  the 
latter  is  reviewed.  The  proposed  model  is  then  introduced. 

A . General  Discussion 

A classifier  measures  a set  of  N features  from  the  response 
of  an  observed  object  to  form  a point  in  an  N-dimensional  space. 
The  N-dimensional  space  is  referred  to  as  the  observation  space . 

In  other  words,  every  observation  can  be  represented  as  a point 
in  the  observation  space.  Every  observation  is  assumed  to  be  a 
measurement  of  the  response  of  an  object,  which  is  corrupted  by 
some  kind  of  noise.  This  noisy  signal  is  then  used  to  determine 
to  which  class  the  observed  object  belongs. 

In  building  an  optimum  classification  scheme,  it  is  necessary 
to  define  what  an  optimum  perfonnance  is  for  a system.  An 
appropriate  criterion  for  judging  the  perfonnance  of  a classifi- 
cation system  is  the  error  probability  in  making  decisions. 
Minimizing  this  error  probability  is  our  ultimate  goal  in  design- 
ing a classification  system.  An  optimum  system  is  therefore  the 
one  which  minimizes  the  probability  of  error  in  the  identification 
process.  In  the  problem  considered,  there  are  two  kinds  of  errors 


one  can  make  in  performing  the  classification,  namely,  classify- 
ing a catalogued  object  as  an  uncatalogued  one,  and  identifying 
an  uncatalogued  object  as  a catalogued  one.  A reasonable  approach 
is  to  make  the  overall  probability  of  misclassification  as  small 
as  possible,  as  is  the  case  for  a Bayes  classifier.  This  requires 
that 

(1)  the  a priori  probabilities  for  each  class  be  given,  and 

(2)  the  probability  density  functions  of  the  observation 
vector  be  known  when  any  of  the  classes  are  present. 

The  first  one  can  be  easily  met  by  assigning  each  class  an 
a priori  probability.  The  second  condition,  however,  will  never 
be  satisfied  since  no  knowledge  about  the  unlisted  class  is  avail- 
able. A different  approach  is  therefore  taken  as  follows. 

In  making  a decision,  the  observation  space  is  divided  into 
two  regions,  say  and  Z^,  corresponding  to  the  catalogued  class 
N and  uncatalogued  class  X,  respectively.  The  observed  object  is 
classified  to  N (or  X)  if  the  measured  response  falls  into  Z^  (or 
Zx).  Therefore,  the  probabilities  of  making  errors  in  the  identi- 
fication are 


B = Pi  x>  Z^| 

IX 1 

(1) 

i = Pi  Xi  Zx 

| N 1 

9 

(2) 

where  x is  the  observed  vector,  a represents  the  probability  of 
error  when  N is  true,  i-  represents  the  error  probability  when  X 


is  true. 


In  terms  of  the  distributions  of  the  observed  vector  x,  these 
two  equations  can  be  expressed  as 


6 = f P(x|X)dx  , (3) 

JZN 

and 

a = P(x|N)dx  , (4) 

Jzx 

where  P ( x | X ) (or  P ( x | N ) ) is  the  probability  distribution  when  X 
(or  N)  is  true.  The  integral  is  the  probability  that  x falls 
into  ZN  (or  Zx)  when  X (or  N)  is  true. 

It  is  clear  that  6 cannot  be  computed  since  the  class  X is 
unknown  and  neither  is  P(x|X).  In  designing  a classifier,  one 
can  try  to  minimize  the  second  error  probability  a by  minimizing 
Zx  if  P(x|N)  is  known  beforehand.  However,  when  Zx  shrinks,  ZN 
expands,  tending  to  increase  3-  This  is  contrary  to  the  main 
purpose  of  minimizing  the  overall  probability  of  misclassification. 

One  way  of  solving  the  above  difficulty  is  to  fix  a instead 
of  minimizing  it  while  trying  to  minimize  ZN,  i.e.,  to  make  Zx  as 
large  as  possible.  This  is  similar  to  the  basic  assumption  in 
the  Neyman-Pearson  classification:  minimizing  one  kine  of  error 
probability  while  fixing  the  other  one.  This  leads  us  to  intro- 
duce the  Neyman-Pearson  rule. 


B. 


Conventional  Classifiers 

In  the  case  of  two  classes,  both  of  the  two  conventional 
criteria,  Bayes'  and  Neyman-Pearson ' s , classify  the  observed 
object  into  one  of  the  two  known  classes,  say  L and  K.  The  two 
classes  L and  K are  sometimes  called  a learning  class  and  an 
exterior  class  respectively.  We  refer  to  the  error  made  when  L 
is  true  as  a type  one  error  (or  false  alarm)  and  the  other  as  a 
type  two  error  (or  miss).  And  we  express  their  probabilities  as 

a = P{xcK|L>  , (5) 

3 = P{xtL|K)  , (6) 

where  x is  the  observed  vector,  and  a,  3 represent  the  probabili- 
ties of  type  one  and  type  two  errors,  respectively. 

The  Bayes  classifier,  using  the  knowledge  of  the  a priori 
information  and  the  statistical  characteristic  of  the  corrupting 
noise,  minimizes  the  average  weighted  error  (or  risk)  by  adjust- 
ing the  decision  process.  An  ordinary  system  governed  by  this 
rule  sets  up  a threshold  to  which  all  the  received  signals  are 
compared.  The  observed  is  then  assigned  to  one  of  the  two  classes. 
The  probability  of  misclassification  of  this  scheme  is  minimum. 

When  the  a priori  probabilities  of  the  two  listed  classes 
are  not  known,  a Neyman-Pearson  criterion  is  usually  used.  The 
basic  idea  is  to  specify  one  kind  of  error,  whichever  is  consider- 
ed more  important,  say  the  first  one  a,  while  minimizing  the 
other.  Very  often,  the  method  turns  out  to  be  a threshold  test 
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and  the  threshold  depends  upon  the  first  kind  of  error  only.  The 
characteristic,  that  the  detector  functions  without  knowledge 
about  K,  is  called  uniformly  most  powerful  (UMP)[5]. 


C.  Neyman-Pearson  Rule 

We  first  apply  the  Neyman-Pearson  rule  to  a special  example 

below. 

1 . One  dimensional  case 

Suppose  that  there  is  only  one  object  in  each  of  the  two 
known  classes  and  that  the  noise  added  to  the  signal  is  Gaussian 
with  zero  mean  and  variance  a*.  In  the  absence  of  noise,  the 
signals  from  L and  K are  S]  and  S2  respectively.  The  conditional 
probabilities  are  therefore 


1 (-(*- S-|)‘ 

p(x|L,  = S7exp 


-(x-So)2 


P(x  | K)  = -7=  exp 


where  x is  the  observed  signal. 

Again,  L and  K represent  the  two  known  classes. 

We  want  to  keep  the  probability  of  making  a type  one  error 
at  some  fixed  value,  say  a,  while  minimizing  the  probability  of 
making  a type  two  error.  That  is,  we  want  to  have 


5F  A P(x|L)dx  = a 
ZL 


and  minimize 


ZL  and  ZK  are  two  disjoint  regions  occupying  the  whole  ob- 


servation space,  and  are  associated  with  L and  K respectively. 

The  object  whose  measured  response  lies  in  any  of  them  is  assigned 
to  the  corresponding  class. 

To  solve  the  problem,  construct  a function  F as  follows: 


F A + X [Pp-ci] 

- »(l-„)  + [ [P(x | K)  - X P(x|L)]dx  . (11) 

‘k 

Since  Pp-a=0,  minimizing  P^  is  equivalent  to  minimizing  F. 
Now,  for  any  positive  x we  like  the  integrand  to  be  as  small 
as  possible  (i.e.,  to  be  negative  or  zero)  such  that  F is  mini- 
mized. Therefore,  if 


P( x j K ) - a P ( x } L ) < 0,  assign  point  x to  ZL 
Or,  if 


(12) 


p 

Y 

wj\j 

M 

p 

x 

"n 

(13) 


Defining 

A(x) 


m 


= exp 


‘2x(S2-S1)-(s|-S^) ' 


2o 


(14) 


we  have 


A(x)  > A assign  to  L,  and  to  K otherwise. 


(15) 


Equation  (9)  can  also  be  written  as 


Since  Pp  = 


00 

Pp  = r P( A | L )dA  = a 

J X 

The  function  P( A | L ) can  be  evaluated  by  using  the  trans 
formation  equation[9] 


P ( A | L ) = P(x(A) I L) 


dx(  A) 


dA 


where 


I 


A 


in  the  region  A is  defined 
otherwi se 


( see  Figure  1 ) . 


Figure  1.  IA=0  for  A<0  since  a(x)  is  not 
defined  in  this  range. 


P(A|L)  = i 


[3K 

dx 

n 

P(x(A) I L) , where  A > 0 


. where  A < 0 
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Therefore,  we  find  a solution  \*  for  Equation  (16),  which, 
substituting  back  to  Equation  (14),  will  give  us  the  threshold  x*. 


Case  a.  (Figure  2) 

Equation  (16)  can  be  transformed  back  to  the  form  of  P(k'L) 
when  S2  'S i (see  Figure  1 and  Equation  (15)), 


PF  = f P(x|L)dx  = « 

J y* 


This  leads  to 

erfc 


x*-S- 


or 


x*  = oz  + S, 

l a \ 


where  erfc(x)  and  z are  defined  as  follows: 

a 


e-y2/2 

erfc(x)  A I dy 

- Ix  S2* 


and 


erfc(z  ) A a 

IX 


•00 


Figure  2.  The  threshold  x*  for  S2"Si . 


(18) 


(19) 


(20) 


(21) 


(22) 


I 


J 


Case  t).  Sa  Sj  (I  ujure  3) 

It  is  clear  that  for  this  case 
* 

|  I  2 P(x | L )dx  = a , (23) 

and  the  threshold 


I mure  3.  The  threshold  for  S^- Sj . 

The  probability  of  making  a type  two  error  r for  both  case: 
a and  b is 

(26) 

which  is  minimized  according  to  the  rule. 


eric 


IV 


Discussion 


A very  common  application  of  the  above  result  is  in  radar 
detection.  L is  set  to  be  the  absence  of  a target  and  K its  pro 
seme.  The  false  alarm  is  kept  to  some  prefixed  constant,  say  a 


i 

! 


I 


i 


I 

. 


A CW  radar  detects  the  backseat tered  returns  and  assigns  the  ob- 
ject to  one  of  the  above  cases.  If  S2,  the  noise  free  response 
of  the  target,  is  assumed  to  be  greater  than  zero,  a threshold 
is  set  at  x*=oz  , provided  that  the  injected  noise  is  Gaussian 
with  zero  near  and  variance  of  The  noise  is  assumed  to  be 

Gaussian  since  it  is  the  most  commonly  encountered  type.  The 
threshold  x*  is  completely  determined  by  the  system  parameters 
' and  1 even  without  knowing  the  amplitude  of  S^.  Nevertheless, 
it  is  based  on  the  assumption  that  So  0 and  Si=0.  When  a noise 
free  signal  Sp  is  not  greater  than  or  equal  to  zero,  ttie  result 
is  different.  It  is  clear  that  if  we  have  no  a priori  information 
about  the  response  ot  trie  external  class  K,  we  have  no  way  of 
finding  the  threshold  x*  according  to  the  rule.  A more  compli- 
cated case  with  some  knowledge  of  K will  be  brought  up  in  the 
next  section. 


2.  n-dimens i ona I case 

If  the  response  of  an  object  is  an  11-tuple  x,  corrupted  by 
zero  mean  Gaussian  noise,  the  conditional  probability  density 
function  is 


P(x|l.) 


(2n)n/2|f| 


1 • 1 1/2 


exp 


- 2^ x -S 1 ) 1 ( X_S ) ) 


and 


(2") 


1 

n/2 1 . i M2 


exp 


- (l(x-Si,)1r',(x-S?) 


(27) 


(28) 


whore 
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x is  the  observed  siqn.il  (column)  vector, 

S|  and  are  noise  free  signal  (column)  vectors,  and 
i is  an  n x n noise  covariance  matrix. 

The  above  can  be  simplified  by  a linear  transformation  as 
described  in  [10]  such  that 


’( x ' , I ) ■* — - exp 

(V )" 


Wx'- S'  )■ 
i i M 

..  2 


and 


P(x'|K) 


Kq-s^,)2] 

i 


(29) 


(30) 


The  summation  above  is  carried  over  i 1 to  i=n.  The  diagoni- 
^at ion  of  the  noise  covariance  matrix  and  the  equalization  of  the 
variance  do  not  require  the  uncorrelatedness  of  the  individual 
components  and  the  above  process  can  be  applied  for  any  Gaussian 
proves s[  11]. 

I qua t ions  (29)  and  (30)  can  be  further  simplified  by  rotat- 
ing and  shifting  the  coordinate  axis  to  obtain 


and 


(.-  V2 


n 


Y 

i = 1 


2 


PU"|L)  - 


exp 


(31) 


I'(x"|K) 


(3? 


;9  ;\n/;’  exp 

(2  no  ) 


(x‘j-isj-s£ir 


O 1 


2o2 


when'  thi'  absolute  value  represents  the  length  of  the  vector. 
Using  the  same  technique  .is  before,  one  obtains 


A(X’) 


exp  ^ 


2x" I S 1 -S' 


s * -s • |c 

9 1 ' 


\ assign  to  k, 
to  L otherwise  (33) 


figure  4.  Rotating  and  shifting  coordinate  axis  to 
obtain  a simpler  expression  for 
Equations  (29)  and  (30). 


It  is  clear  that  the  decision  boundary  is  a straight  line 
in  a two-dimensional  space,  a plane  in  a three-dimensional  space 
and  a hyperplane  in  an  n-dimensional  space  in  the  transformed 
coordinate  system  x".  Ihis  boundary  surface  is  then  used,  with 
the  knowledge  of  the  previous  linear  transformation  to  obtain  the 
threshold  boundary  in  the  original  system. 
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Figure  5.  The  distance  between  Si  and  the 
decision  boundary  in  the  x' 
coordinate  system  is  z a. 


It  is  interesting  to  note  that  the  distance  between  the  deci- 
sion boundary  and  is  still  zwo  (Figure  5).  Obviously,  the  deci- 
sion plane  depends  on  the  orientation  of  S]  and  S£  and  their  rela- 
tive positions  too.  However,  it  is  always  a hyperplane  tangential 
to  the  hypersphere,  whose  radius  is  z a,  centered  at  Sj  in  the  x’ 
coordinate  system  (Figure  6).  Note  that  the  radius  is  independent 


of  S£. 


Ine  i i i or  pro Lui d 1 1 i ty  . for  this  n-dimensional  case  is 
obtained  sjmilarily  as 


\*i*\ 

erf  c | Z 

a o 


(35) 


where  S',  and  Sj  are  the  transformed  vectors  of  §2  and  discuss- 
ed before. 

Note  that  when  a train  of  observed  signals,  xj,  x2,  •••,  ::n 
are  taken  independently,  the  noises  added  to  each  component  are 
very  often  independent  of  one  another.  One  can  simply  use 
Equations  (?0)  and  (30)  directly. 


1).  Middleton's  Modified  Neyman -Pearson  Rule 

When  there  is  more  than  one  element  in  either  of  the  two 


classes,  or  when  S|  and  are  statistically  distributed  in  the 


observation  space  Z,  Middleton  proposed  to  fix  the  average  of  Pp, 
i.e.,  ■-Pp'',  instead  of  Pp  and  minimize  the  average  of  P^,  i.e., 
Pw[12].  This  requires  the  conditional  probability  density 
functions  P ( | L ) and  PtS^lK). 


1.  One  element  in  L 

(a)  Suppose  there  is  only  one  element  in  the  learning  class 
L,  say  S|,  in  a one-dimensional  observation  space,  and  two  elements 
So]  and  S^2  in  the  exterior  class  K with 


P(S21|K) 


'1 


(35) 


and 


IQ 


* 


i 


I 


?1 


As  before 


P ( A | L ) d A 

A* 


(16) 


= P ( x | L)dx  . (41) 

JzK 

Note  that  P(x|L)  is  independent  of  S2 -| » S22 - 

When  S2 1 » S22 ''S 1 or  S21  >$22^1 , the  threshold  turns  out  to  be  the 
same  as  in  the  one  element  case  obtained  in  the  last  section. 


1 

I 

J 

.1  1 


A(  x)  = A ( x ) + B(  X) 


When  S22<:Si<S2i  (Figure  7)  or  S22"S]>S2i,  ZL  = [x*,x*]  can 
be  found  by  Equations  (41)  and  (38).  A table  is  constructed  of 
u(A)  vs  A from  Equation  (16)  and  a A*  is  chosen  such  that  «(a*) 


is  equal  to  the  prefixed  value-  The  correspondi ng  x*,x*  are  then 
obtained  from  Equation  (38). 
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The  region  ZL  depends  not  only  upon  the  relative  positions 
of  S91,S9o  to  Sj , but  also  upon  the  values  of  S21-Sls  S22-Si , 
P|P(S?]jK)  and  p2P(S22|K).  This  above  conclusion  holds  in  a 
multi-dimension  case  with  the  scalors  being  replaced  by  the  vectors. 

When  there  are  more  than  two  elements  in  K,  the  computation 
is  even  more  complicated  and  results  in  a more  intricate  region 

h’ 

(b)  If  there  is  only  one  element  ^ in  the  learning  class 
L and  S2  is  distributed  in  the  observation  space  Z as  follows: 

P ( S o I K ) = ( t— — ) e*P 

2 V-KJ) 

s 

i.e.,  normally  distributed  around  S-j. 

We  have 

< P ( x | K ) > = f P ( S 2 1 K ) P(x|S2)dS2  . (43) 

Suppose,  the  noises  added  to  each  component  are  independent 
of  one  another  and  are  again  Gaussian  with  zero  mean  and  variance 
Equation  (43)  becomes 


IS  -S  I ^ 
1 2 1 1 

I? 


(42) 


P(x|K) 


TTniT+o) 


exp 


Ix-S 


, 2 2x 
2(o$+o  ) 


(44) 


Applying  the  same  technique,  we  obtain 


•P(x 

MO 

"P(x 

IT)' 

A(x')  = 


(45) 


(46) 


f 4 )"/2 

o?|x-S1 |2“ 

cxp 

s 

indicating  that  ,\*  or  |x*-S, | is  again  independent  of  S^,  and  de- 
pends on  i only.  The  decision  boundary  here  is  a hypersphere  with 
the  radius  being  a function  of  a. 

The  above  example  demonstrates  that  if  the  distribution  of 
S;>  is  symmetric  with  respect  to  S-|,  the  decision  boundary  will  de- 
pend upon  the  preset  false  alarm  probability  a only. 


2.  Multiple  elements  in  L 

When  there  is  more  than  one  element  in  L,  the  computation 
becomes  more  complicated  but  the  rules  still  hold.  An  example  is 
given  here. 

Suppose  there  are  two  elements  in  L,  whose  corresponding 
noise  free  points  are  and  in  a two-dimensional  observa- 
tion space.  Also,  5^-S^  and  P(S11 1 L ) =P(S‘1 2 1 L )=1 /2.  S2  is 
uniformly  distributed  in  a circle,  centered  at  the  origin  with 
the  rauius  of  rs. 

Then 


- P(x | K)  ■ 


P(xj'S,)P(S,  K)dW, 

c c £ 


s; 

5 

tO 

"rs 

(x2+r|)“ 

2,/ 

d i S0  do 

C 


J 


(47) 
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where  I0  is  the  modified  Bessel  function  of  order  zero  and  is  an 
even  function  of  its  argument  (Figure  8). 


Figure  8.  A modified  Bessel  function  of  degree 
zero  is  a two-dimensional  hyperbolic 
cosine  function. 


Also 


wnere  x-S^  is  the  inner  product  of  two  vectors  x and  , 


2b 


It  is  clear  that  the  decision  boundary  is  no  longer  a circle 
here  (Figure  9). 


E.  Identifying  Unlisted  Objects 

So  far,  we  have  catalogued  all  the  known  objects  into  two 
classes;  viz.,  the  learning  class  L and  the  exterior  class  K. 

The  observation  space  Z is  divided  into  two  disjoint  regions  Z, 
and  Z«,  corresponding  to  the  two  classes  L and  K.  The  observed 
object  is  assumed  to  be  one  of  the  known  objects.  Its  response 
is  measured  and  assigned,  according  to  the  region  where  the  mea- 
surement falls,  to  one  of  the  two  classes.  The  classifier  is 
claimed  to  be  optimum  in  that  it  minimizes  one  kind  of  classifi- 
cation error  while  fixing  the  other  one. 

In  our  problem,  there  are  also  two  disjoint  classes:  viz., 
the  listed  and  the  unlisted  classes.  To  make  a classification, 
or  more  specif ica1 ly,  to  decide  whether  an  object  is  known  or 
not,  there  are  also  two  kinds  of  errors,  namely,  the  error  made 


when  N is  true  and  the  error  made  when  X is  true.  However,  since 
the  objects  in  the  unlisted  class  are  unknown,  no  a priori  infor- 
mation about  them  is  available.  Consequently,  minimizing  the 
error  made  when  X is  true  or  fixing  this  kind  of  error  is  not 
possible. 

We  note  that  in  the  case  of  Neyman-Pearson  rule  the  observa- 
tion space  Z is  divided  into  two  regions,  ZL  and  ZK  corresponding 
to  the  two  classes  L and  K.  Similarily,  in  dealing  with  our 
problem,  we  can  also  divide  the  observation  space  Z into  two 
regions  Z^  and  Z^,  associated  with  the  listed  class  and  the  un- 
listed class  respectively.  Any  observed  measurement  that  falls 
into  the  region  Z^  (or  1%)  will  be  identified  as  belonging  to  the 
listed  (or  unlisted)  class.  The  error  made  when  N is  true  is 
also  called  a type  one  error  and  the  error  made  when  X is  true  is 
a type  two  error.  The  following  criterion  is  used  to  determine 
an  optimum  classifier. 

1 . The  criterion 

In  discriminating  between  uncatalogued  objects  from  cata- 
logued objects,  the  observation  space  Z is  divided  into  two  dis- 
joint regions  and  Z^,  associated  with  the  listed  and  the  un- 
listed classes  respectively.  When  the  error  probability  of 
classifying  a known  object  in  the  listed  class  to  the  unlisted 
class  is  prefixed  to  a specific  value  other  than  zero,  the  opti- 
mum rule  is  the  one  which  minimizes  Z^. 

By  minimizing  Zj|  we  mean  minimizing  V ( Zj^ ) , the  volume  of 
the  region  Z^,  where  we  use  the  notation 


V ( Z, ) A dx 

The  main  idea  here  is  similar  to  that  of  the  Neyman-Pearson 
classifier.  In  the  latter,  the  probability  of  making  a type  two 
error  is  minimized  while  that  of  making  a type  one  error  is  fixed. 
In  our  case,  a type  two  error  is  impossible  to  determine;  instead, 
the  type  one  error  probability  is  fixed,  while  the  volume  of  Z^, 
the  region  associated  with  the  listed  class,  is  minimized.  By 
minimizing  V(Z^),  the  possibility  of  an  unlisted  object  response 
falling  into  Zfj  will  also  be  minimized,  thus  reducing  the  possibil- 
ity of  making  a type  two  error.  This  can  be  clearly  seen  from 
Equation  (3).  Since  P(x|X)  is  always  positive,  whenever  the 
region  Z^  decreases,  3 will  be  reduced  accordingly. 

This  is  indeed  an  optumum  classifier  which  we  need  in 
identifying  an  unknown  object.  The  criterion  prefixes  the  type 
one  error  probability  such  that  the  region  Z^  is  large  enough 
not  to  exclude  noisy  responses  from  the  known  objects.  Mean- 
while Z^  is  kept  by  the  criterion  as  small  as  possible  so  that 
the  classifier  can  most  effectively  identify  an  unknown  object. 

Therefore,  the  main  philosophy  in  designing  a classifier 


to  identify  an  unknown  object  is  based  on  the  following  two 


principles: 

1.  Keeping  the  probability  of  misclassifying  a listed 
object  as  unlisted  to  a fixed  value.  This  is  done 
by  fixing  the  error  probability  n when  constructing 
the  region  ZN. 


Min imi z i nq  the  likelihood  of  classifying  an  uncata- 
logued  object  as  catalogued.  This  is  accomplished  by 
minimizing  the  volume  of  Z^,  the  region  associated 
with  the  known  class. 

Note  that  the  last,  rule  implies  minimizing  the  type  two 
error  probability  without  any  information  regarding  the  unknown 
c 1 ass . 

Z|yj  can  be  a compact  region  or  a set  ot  several  disjoint 
regions,  depending  upon  the  distribution  of  the  observed  vector 
x in  the  observation  spac-  Z.  However,  the  volume  of  Zfj  is 
always  finite  when  the  error  probability  is  greater  than  zero 
(when  < equals  zero,  the  problem  becomes  trivial  and  is  not  con- 
sidered here).  This  will  be  proved  later.  Since  the  volume  of 
the  region  is  minimized,  the  criterion  is  referred  to  as  a 
minimum  volume  crijterion.  The  two  regions  Z(g  and  Z^  are  dis- 
joint and  occupy  the  whole  observation  space.  Some  of  their  pro- 
perties will  be  further  discussed  in  the  next  chapter. 

2.  Null  class 

We  classify  an  observed  object  to  be  one  of  the  two  classes 
namely,  N and  X,  as  we  get  some  responses  from  an  object.  This 
actually  implies  that  the  classifier  may  make  an  identification 
even  without,  the  presence  of  any  object,  so  long  as  a response  is 
shown  in  the  measurement.  The  case  of  no  object  present  is  inde- 
pendent of  the  two  aforementioned  classes  and  is  another  kind  of 


class  itself.  It  is  therefore  called  a null  class  and  is  denoted 


In  a practical  system,  noise  is  added  to  the  original  sig- 
nal and  is  reflected  in  all  measurements.  Therefore,  when  a 
detector  detects  a signal,  the  possibility  that  a tested  vector 
is  originating  from  a null  class  cannot  be  ruled  out  and  should 
be  taken  into  consideration  in  an  identification  problem.  The 
noise  free  signal  of  a null  class  is  obviously  zero,  invariant 
with  the  features  selected  and  may  be  considered  a known  object. 

Figure  10  shows  an  application  of  this  concept.  The  null 
class  is  combined  with  the  listed  class  N to  form  a new  class  N'. 
N'  is  then  used,  with  the  preset  error  probability  for  a type  one 
error  as  defined  before,  to  construct  a region  Z^i  for  the  new 
listed  class  N'.  A measured  vector  is  tested  whether  it  is  in 
/fji.  If  it  is  in  Z|yji,  a conventional  scheme  is  used  to  identify 
the  object  as  one  of  the  catalogued  objects,  including  the  case 
of  no  object. 

Note  that  the  relative  frequency  of  occurrence  of  each 
object  in  the  new  listed  class,  including  that  of  the  null  class, 
has  to  he  known  to  construct  the  region  Z^i.  Therefore,  the  a 
priori  probability  of  the  null  class  when  both  r.  and  N are  pre- 
sent has  to  be  determined  beforehand.  Since  it  can  range  from  a 
small  number  close  to  zero  to  some  number  near  one,  depending  on 
the  system,  we  will  not  include  the  null  class  in  our  subsequent 
discussions.  However,  the  process  is  similar  to  the  N case  and 
is  i nip lit  itely  inc  luded  in  the  general  discussion.  Unless 
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the  a priori  probability  of  the  null  class  is  close  to  one,  in 
which  case  ZN  is  enlarged  in  the  vicinity  of  the  origin,  no 
special  consideration  is  necessary  for  the  null  class.  Neverthe- 
less, a design  engineer  should  not  ignore  the  existence  of  a 
null  class  in  a practical  system. 


CHAPTER  III 

SOME  PROPERTIES  OF  THE  REGION  ZN 

The  previous  discussion  illustrated  the  rules  governing  the 
design  of  a classifier  for  identifying  unlisted  objects  as  dis- 
tinct from  listed  ones.  The  goal  of  the  design  is  to  find  a 
region  Zm  satisfying  the  constraint  that  the  volume  V(ZN)  be 
minimized  while  fixing  the  probability  of  making  a decision  error 
when  the  observed  object  is  in  the  listed  class.  The  problem  of 
finding  a boundary  surface  separating  the  observation  space  Z 
into  ZN  and  Zx  is  equivalent  to  that  of  finding  a threshold  in  a 
transformed  space.  It  will  be  shown  that  utilizing  the  threshold 
greatly  simplifies  the  problem. 

We  begin  this  chapter  by  demonstrating  some  characteristics 
of  the  region  Z^  introduced  in  the  last  chapter.  With  the  assump- 
tion that  the  probability  density  function  of  the  observed  vector 
is  continuous  and  bounded,  we  first  show  that  there  exists  some 
ZN  satisfying  the  proposed  criterion  and  that  under  certain  con- 
ditions Z^  is  unique.  The  proof  also  reveals  a method  of  con- 
structing Zjy|. 

A method  of  transformation  is  then  deduced  from  the  above 
results.  A few  one-dimensional  examples  are  worked  out  analytic- 
ally. In  general,  however,  a closed  form  solution  is  difficult 

to  obtain,  due  to  the  complexity  of  the  transformation  for  most 
of  the  functions  encountered  in  practice.  Yet,  with  the  help  of 
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Monte  Carlo  simulations,  all  the  problems  are  numberically  solv- 
able except  in  the  cases  where  is  not  unique.  Two  algorithms 
are  proposed  and  applied  to  a practical  aircraft  identification 
problem.  The  tesults  show  that  the  criterion  leads  to  an  effec- 
tive and  easily  implementabl e scheme. 

The  proposed  criterion  is  compared  with  the  Neyman-Pearson 
criterion  in  a one-dimensional  case.  It  is  shown  that  our  scheme 
is  less  sensitive  to  a priori  information  of  the  unlisted  class 
than  a Neyman-Pearson  classifier  to  the  a priori  information  of 
the  exterior  class.  However,  in  terms  of  the  probability  of  mis- 
classi fication,  the  latter  does  perform  better  because  it  minimizes 
the  error  probability  while  the  former  minimizes  the  volume  of  the 
region  only.  The  tradeoffs  are  also  discussed. 

A.  Some  Properties  of  Z^ 

A classifier  measures  a noise  corrupted  response  of  an 
object,  say  x,  and  assigns  the  observed  object  into  the  listed 
class  or  the  unlisted  one.  The  probability  density  function  of 
the  measurement  x when  the  listed  class  is  present  is  denoted  by 

g(x)  ,\  P(x|N)  . (50) 

In  case  there  are  several  subclasses  (or  objects) 

Cn  in  the  listed  class.  Equation  (50)  becomes 

g(x)  = l P(x(Ci)P(Ci(N) 
i = l 


(51) 
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where 


ii  is  the  number  of  subclasses  (or  objects) 

P(C^iN)  is  the  relative  frequency  of  occurrence  of 
in  N,  and 

P(x|C^)  is  the  conditional  probability  density  function 
when  C.j  is  present. 

If  the  ith  noise  free  signal  S-j  is  distributed  over  the 
observation  space  Z and  its  distribution  function  is  denoted  by 
f^SjC^),  Equation  (51)  becomes 
n r 


g(x) 


l P(Ci |N)  P(x|Si)fi(Si|Ci)dSi 
i=l  ' 


(52) 


Here,  P(x)S-)  is  the  probability  density  function  of  the 
measured  vector  x when  the  noise  free  signal  is  S^.  The  integra- 
tion is  carried  out  over  the  entire  observation  space. 

Obviously,  if  there  is  only  one  subclass  in  N,  Equation  (52) 
can  be  simplified  as 


g(x) 


P(x|S)f(S|N)dS 


(53) 


again,  f(S|N)  is  the  probability  density  function  of  the  noise 
free  signal  S. 

In  general,  g(x)  is  either  continuous  or  discrete  in  the  ob- 
servation space.  However,  for  most  of  practical  cases,  like 
those  where  Gaussian  noise  is  added  to  noise  free  signal,  g(x)  is 
continuous  and  defined  over  the  whole  observation  space.  Therefore, 
in  the  following  discussions,  we  will  make  the  assumption  that 
g(x)  is  continuous  and  bounded  everywhere  in  the  observation  space. 


I 

I 


From  the  basic  characteristics  of  a probability  density 


function  we  also  have 

g(x)dx  = 1 
JZ 


and 


g(x)  > 0 for  all  x in  Z 


By  definition,  Z^  is  a region  of  observation  space  In  which 
an  observed  signal  is  assigned  to  the  listed  class.  The  probabil- 
ity of  misclassifying  known  objects  as  unknown  is  therefore 


P 


F 


where 


g( x)dx 


(56) 


Z ’ ZN  = Zx  » (57) 

and  Zj^|  U Z^  = Z by  definition. 

In  other  words,  Pp  can  be  expressed  as 


PF  = 1 - I g(x)dx 
ZN 

Our  criterion  is  to  hold  Pp 
the  region  Z.,,  viz.,  minimize 


(50) 

at  a fixed  value  and  to  minimize 


V(ZN) 


provided  that  V(Zpj)  exists. 


(59) 


The  observation  space  Z,  as  stated  before,  is  divided  into 

two  disjoint  regions  ZN  and  Zx  by  a threshold  surface  ST.  The 
classification  rules  are  as  follows: 
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Let  N denote  the  listed  (known)  class,  X the  unlisted  class 
and  Cx  the  observed  object. 

if  x r ZN  Cx  is  assigned  to  N, 
if  x c Zx  Cx  is  assigned  to  X,  and 
if  x is  on  Sj,  Cx  can  go  either  way. 

Usually  Sy  is  contained  in  either  Zf^  or  ly,  indicating  that 
Z^  is  a closed  or  an  open  set.  Our  main  objective  here  is  to  find 
a threshold  surface  Sy. 

We  now  show  that  for  a continuous  and  bounded  density  func- 
tion g(x)  there  exists  at  least  one  region  Z^  described  by  the 
criterion.  We  first  state  and  prove  a lemma  below  and  use  it  to 
prove  the  existence  of  Z^. 


1 . Lenina 

Assume  that  g(x),  a probability  density  function,  is  contin- 
uous and  bounded  in  the  observation  space  Z and  a is  a number  such 
that  0<a«'l . If  a region  Z^  in  Z satisfies 


(1) 


g(x)dx 


a 


(2)  for  any  x E ZN,  y / ZN 
g(x)>g(y)  ; 


(60) 


(61) 


and  a region  Z^,  in  Z,  different  from  Z^,  satisfies 
g(x)dx  = 1 - a , 


(62) 


then 
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V(ZN)<V(ZN.)  (63) 

2.  Proof 

We  first  show  that  the  region  specified  bv  Equations  (60) 
and  (61)  is  finite  and  then  show  that  for  any  Z^<  specified  by 
Equation  (62),  Equation  (63)  holds. 

Since  l>v-0,  from  Equation  (60)  we  have 

g(y)dy  = a > 0 
ZN 

where  Z^  is  the  complement  of  Z^.  This  equation  shows  that  not 
all  g(y ) 0 for  y e ZN- 

Therefore,  there  must  exist  a y-|  / Z^ , such  that  g(y-|)=a>0. 
From  the  given  condition  Equation  (61),  we  have 


g(x)  i g(yi)  = a > 0 for  any  x e Z^ 


Also,  by  Equation  (60)  and  the  above 


g(x)dx  >_  a 


-N 


dx 


we  get 


L dx  ■ V(ZM>  ‘ a 
ZN 

Since  a^O,  V(ZN)  is  finite. 

If  the  region  Zf^  specified  by  Equation  (62)  is  infinite, 
then  Equation  (63)  is  always  true. 


If  Zj^j • is  finite  and  different  from  Z^,  then  let 


Zf.  A zN,  - ZC 
Zf  a zN  - zc 

From  the  given  conditions 

f g(x)dx  = g(x  ' )dx ' = 1 - a . 

JzN  JzN. 

Subtracting  the  above  by  the  integration  over  Zc,  one  gets 

f g(x)dx  = f g(x')dx' 

JZf  zf 

However,  since 

g(x)  g(x’)  for  any  x e Zf 

x'e  Zf. 

the  above  equality  holds  only  if 

f dx  < [ dx 
JZf  JZf 

That  is 

V(Zf)  < V(Zf.) 

Adding  the  volume  of  Zc  on  both  sides,  we  have 


This  proves  the  lemma. 


We  have  shown  that  among  all  the  regions  satisfying  Equation 
(62),  the  one  specified  by  Equation  (61)  is  finite  and  has  the 
minimum  volume.  Now  we  show  that,  for  any  g(x)  being  continuous 
and  bounded,  there  exists  at  least  a minimum  region  satisfying 
Equation  (62). 

We  first  construct  a region 

S(0  A_  lx|g(x)  , for  5 > 0 (64) 

i.e.,  a region  formed  by  all  points  for  which  the  value  of  the 
function  g(x)  is  larger  than  or  equal  to  ti- 
lts complement  is  denoted  by 

SU)  A Z - s(c) 

= U|g(x)  < £}  , for  c > 0 . (65) 

It  can  be  proved,  following  the  same  reasoning  line  which 
proved  that  V(Z^)  was  finite,  that  V( S(O)  is  always  finitie  if 

6>n. 

From  the  lemma,  if 

|s({) 9Wdr= ' <66> 
then  S(c)  is  a minimum  region  constrained  by  the  criterion. 

Now  we  prove  the  following  theorem. 

Theorem 

For  any  continuous  and  bounded  probability  density  function 
q(x),  and  an  a such  that  0<«<1 , there  exists  a minimum  Z^  in  Z 
■ at i sf ying 


J g(x)dx  = 1 - a 


Proof 


Defining 


G(0  A g(x)dx 

JS(C) 


(67) 


where  S ( f, ) is  defined  by  Equation  (64). 

By  the  definition  of  S(£)  and  continuity  of  g(x). 


S(e,2)Z>  S(c1 ) if  > 0 


where  m is  the  maximum  of  g(x)  in  the  observation  space  Z. 

Note  that  S(t;2)  strictly  contains  S( Cl ) - Therefore  we  can 

obtain 


G(c^)  > G( ) if  ,r>m  > ? 0 


indicating  that  G(0  is  a strictly  decreasing  function. 

Also,  from  the  basic  characteristics  of  a probability  den- 
sity function,  we  have 


G(0)  = 1 


and 


40 


I 

1 

I 

I 

I 


j 

1 


i 


(a)  If  G(f.)  is  continuous  over  [0,  ,„]  (Figure  11),  then 
since  G(  ) is  strictly  decreasing  in  , there  is  a one- 
to-one  correspondence  between  \ and  G(  ).  For  every 
a y 0- a-  I,  we  find  a unique  ..j  such  that 
GUt)  I - „ . (68) 

I is  between  0 and  f.M)  and  is  not  equal  to  0 or 
s i nee  0- a I . 

As  shown  before,  S ( • y ) is  the  minimum  region  7^. 


(b)  If  G(0  is  not  continuous  in  say  discontinuous  at 
a point  (Figure  12),  let. 

S,(; ) a i x | g ( x ) > si  • (69) 


I 

I 


Figure  12.  G(  ) is  discontinuous. 
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We  have 

G(Cd)  = <Mf’d+)  + h ’ 

where  h>0  is  a jump  of  6(c)  at  Cd  and 


(70; 


G(Cj+)  a g(x)dx 


(71) 


From  the  definition  of  G(c),  it  is  clear  that 


G('.d)  - G I'd  > ' 


So(f>d) 


g(x)dx 


where  SQ  is  defined  as 


S0(c)  A { x 1 g (x ) = Cl 


(72) 


(73) 


viz.,  the  set  of  all  the  points  where  g(x)  = c* 

This  indicates  that  there  is  a finitie  volume  for 
the  region  S0(cd)  in  which  the  function  g(x)  is  con- 
stant and  equals  to  Cd,  where  the  discontinuity  of 

G(>‘)  occurs. 

Consequently,  for  any  number  b such  that  0<b<h/Cd 
we  can  obtain  a region  S0^(cd)  in  S0(cd)  so  that 


wVd»  - b 

This  is  equivalent  to 
g(x  )dx  = Cdb 


Sn.  ('  h) 


(74) 


(75) 


wi  th 


(77) 

(78) 

(79) 


S ^cf)  is  n°t  unique  since  defined  by  Equations  (78) 

and  (79)  is  not.  Nevertheless,  from  the  previous  lemma,  SU^)  is 
a minimum  region. 

For  any  (1-a)  lying  in  the  continuous  portion  of  GU)>  we 
can,  as  before,  obtain  a unique  for  each  a such  that  G(cj)=l-a 
since  there  is  a one-to-one  correspondence  relationship  between 

G(0  and  f,  over  this  range.  The  corresponding  S(fj)  is  again  a 
minimum  region. 

In  conclusion,  for  any  continuous  and  bounded  density  func- 
tion g(x)  and  any  a such  that  0<a<l , we  can  always  find  a minimum 
region  satisfying 


i 
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From  the  above  discussion,  we  can  also  have  the  following 

(1)  If  ZN  satisfies  the  criterion,  then  for  any  x t 
and  any  y / ZN 

g(x)  i g(y) 

(2)  If  for  any  x e Z^  and  any  y / Z^ 

g(x)  > g(y) 

ZN  is  a minimum  region  among  all  the  ZN ' s satisfying 


(3) 


g(x)dx 


N 


g( x)dx 


If  ZN  is  a region  which  satisfies  the  criterion  and 

is  closed  and  if 

;\y  = min  g(x) 
x>  ZN 


then  ZN  is  unique  unless  V (SQ( Cj ) ) ^ °>  in  other  words, 
the  volume  of  the  region  over  which  g(x)=F,y  is  not  zero. 

A few  examples  are  given  here  to  demonstrate  how  the  above 
theorem  works. 

Example  1 

When  the  listed  class  is  present,  the  probability  density 
function  of  the  observed  vector  x is  triangular  as  shown  in 
Figure  13.  We  would  like  to  find  the  region  ZN  associated  with 
the  listed  class  when  a=0.05,  0.01  and  0.001. 

Obviously,  g(x)  is  continuous  and  its  integral  over  the  en- 
tire x-axis  is  one.  Also,  g(x)  is  bounded  and  ranges  from  0 to  1 . 


Figure  13.  g(x)  is  a triangular  function 


We  first  find  the  function  G(fJ.  As  before 


G(0  = g(x)dx 

JS(0 


where 


S(,\)  = t x | y ( x ) >_  fj 

We  obtain  (Figure  14) 


From  Equation  (68) 

when  0 < a «•  1 

when  = 0.05 
when  a = 0.01 
when  x 0.001 


I 


1 


i 


Corresponding  to  the  above 


ZN  = S(f.T)  = [-0.776,0.776] 

when  ^ = 0.05  , 

- [-0.9, 0.9] 

when  a =0.01  , 

[-0.9684,0.9684] 

when  a = 0.001 

Exampl e 2 

It  y (x ) is  a function  as  shown  in  Figure  15,  find  a corres- 
pond i n>i  when  . is  a fixed  number  in  (0,1 ). 


Figure  15.  g(x)  including  a flat 
probability  density. 
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Again,  g(x)  is  continuous  and  bounded  everywhere  on  the  x- 
axis  and  its  integral  over  the  entire  x-axis  is  unity.  Like  be- 
fore, we  first  find  G(t.)  by  inspection 

r,  < 0 

0 < £ «■  0 . 5 , 

0.5  < £ < 1 

r > ] 

It  is  clear  that  there  is  a one-to-one  correspondence  between 
and  G(0  when  £ is  in  (0,1),  except  at  £=0.5,  or  equivalently, 
when  G(0  is  between  0.375  and  0.875.  We  can,  as  before,  obtain  a 
unique  ZN  for  any  a whose  value  is  not  in  (0.125,0.625)  and  0<a<l. 

When  0. 125<«-0.625,  we  can  set  £T=0.5  and  form  a region  ZN 
according  to  the  method  described  in  the  theorem. 


1 ,2 


G(0 


1 - 2 

1 

2 


For  instance,  when  a=0.5,  we  can  form  a region  Z|\j  in  Z (i.e., 
x-axis  in  this  example)  such  that 


ZN  = [-0.25,0.25]  U S iO(0. 5) 
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when  $iO(0.5)  is  a region  consisting  of  a segment  or  a set  of 
segments  whose  total  length  is  0.25  and  lies  between  x=0.25  and 
x 1.25.  However,  to  have  a higher  stability  in  the  classifica- 
tion process,  a subregion  of  x=0.25  to  x=0.5,  right  adjacent  to 
x 0.2 5,  is  preferred  in  forming  the  region  Z^. 

This  illustrates  that  for  any  , such  that  0<a<l,  there 
always  exists  a region  ZN  satisfying  the  criterion. 


Figure  17.  SlO(0.5)  is  a region  consisting  of 

a segment  or  a set  of  segments 
between  x=0.25  and  x=1.25;  its 
total  length  is  0.25. 

B.  One-d imens i onal  Case  and 

Method  of  Transformation 

It  is  clear  that  if  V ( SQ ( y))  + 0,  the  region  ZN  is  not 
unique  and  there  is  an  instability  in  constructing  the  classifier. 
This  can  be  more  closely  related  to  the  function  G(  ) in  Equation 
(67).  If  the  rate  of  change  of  G(  ) with  respect  to  is  large, 
an  instability  would  become  noticable.  Tlie  rate  of  change  is 
strongly  associated  witli  the  behavior  of  q(x)  in  the  observational 
space.  In  most  of  practical  situations,  the  error  probability  < 
is  set  to  be  a small  number.  Under  those  circumstances,  the 
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instability  becomes  almost  irrelevant  to  the  overall  performance  be- 
cause the  effect  is  definitely  small  when  i.j  is  small.  This  will  be 
shown  later  in  Chapter  1II-E  that  even  when  g(x)  is  almost  flat,  the 
instability  is  still  not  noticeable.  The  derivative  of  G(0  at  t.j 
can  serve  as  an  indicator  to  this  instability. 

However,  if  necessary,  the  instability  can  be  eliminated 
completely  if  the  classification  process  is  carried  out  in  the 
x domain  where  a region  can  always  be  found  and  consists  of 
some  compact  regions.  This  essentially  guarantees  that  the  problem 
is  solvable  as  far  as  g(x)  is  continuous  and  bounded  in  the  observa- 
tion space. 

From  the  above  discussions,  it  is  seen  that  the  solution 
for  our  problem  is  unique  when  V( SQ(0)  0 for  all  5>0-  This 
occurs  in  many  practical  situations  and  represents  most  of  the 
distribution  functions  that  one  will  encounter.  Any  monotonic 
function  or  the  summation  of  a finite  number  of  monotonic  func- 
tions are  in  this  category,  particularly  Gaussian  distributions 
that  occur  in  most  communications  or  measuring  systems.  The 
function  g(x)  consists  of  a finite  number  of  such  monotones  if 
the  number  of  subclasses  in  the  listed  class  is  assumed  to  be 
finite,  which  is  a legitimate  postulate  in  all  practical  cases. 
Therefore,  in  this  study,  the  function  G ( ) defined  in  Equation 
(67)  will  be  considered  continuous  and  its  corresponding  region 
S(f.)  unique. 
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With  the  assumptions  made  to  construct  the  region  Z^, 
associated  with  the  listed  class  for  a prefixed  error  probabil- 
ity a,  we  can  carry  out  the  integration  in  Equation  (67)  from 
the  portion  where  the  function  g(x)  has  the  maximum  value,  in 
the  descending  order  of  g(x),  until  G(fJ  reaches  the  value  1-a. 

In  doing  so,  we  are  actually  carrying  out  the  integration  of  g(x) 
over  S (0  in  the  descending  magnitude  of  £•  Therefore,  we  can 
define  a new  variable  P (fj  such  that 

c, 

PeU)ds  = g(x)dSU) 

= E dS(E)  (80) 

at  every  r,. 

In  a one-dimensional  space,  the  right  side  of  Equation  (80) 
is  equivalent  to 

g(x-| ) | dx  i | + g(x2)jdx2|  + ---  + g(xn)|dxn 
where  (Figure  18) 

E = g(x])  = g(x2)  = ---  = g(xn) 

Using  the  above  two  expressions,  we  obtain 


Figure  18.  The  function  y=g(x)  intersects  y=£ 
at  ,x2>---,xn. 


provided  that  g(x)  is  continuous[9,13]. 

Obviously,  if  the  curve  y=f;  intersects  y=g(x)  at  a finite 
number  of  points,  the  function  P( (O  can  be  always  obtained  by 

the  above  transformation.  Also,  Pc(s)  is  positive  and  defined 
over  (0,f.m)  and 

P (Ode  = fCm  P (Odt  = 1 . (8 

Jo  e Jo  C 

The  threshold  is  obtained  by  solving 

P"1  P U)df,  = 1 - a (8 

fT  C 

which  can  be  used  to  determine  the  boundary  surface  Sj. 

Solving  Equation  (83)  is  a straightforward  step  if  P.(f,) 


can  be  analytically  determined  from  Equation  (81). 
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Note  that  the  transformation  Equation  (81)  is  almost  the 
same  as  that  of  transforming  a probability  density  function  from 
one  variable  to  another  except  that  z here  is  not  a random  vari- 
able^,13].  Pt (c)  is  a function  containing  the  probability  con- 
tributions of  all  the  points  in  Z for  which  the  function  g(x)=£. 
Reference  [13]  has  a very  thorough  discussion  about  the  one-to- 
one  and  non  one-to-one  transformation  for  a continuous  function 
g(x).  It  also  describes  the  transformation  from  Z space  into  z 
domain  for  a multiple-variable  function  g( xi , » — »xn),  which 
is  denoted  as  g(x)  in  our  notation.  As  the  number  of  dimensions 
increases,  the  transformation  becomes  very  complicated  even  when 
t=g(x)  is  a very  simple  function.  However,  g(x)  is  usually  not 
very  simple  and  the  transformation  is  not  an  easy  task  in  most 
situations.  We  will  not  go  into  any  detailed  discussion  of  the 
transformation  in  multi-dimensional  space  in  this  work.  Instead, 
a method  of  Monte  Carlo  simulation  is  presented  later  to  ease 
this  computational  difficulty,  which  enables  us  to  bypass  all  the 
problems  numerically  without  having  to  use  the  transformation 
process. 

One  Element  Case 

A one-dimensional  problem  given  in  Section  1 1 -C  is  used  here 
to  demonstrate  the  above  approach.  Suppose  C-j  is  the  one  element 
in  N and  its  noise  free  signal  is  S-. . No  information  about  the 
unlisted  class  is  available.  We  would  like  to  find  a region  Z^ 
associated  with  the  listed  class  N when  the  misclassification 
probability  for  the  listed  class  Pp  is  set  to  be  a. 


The  probability  density  function  of  x when  the  listed  class 
is  present,  is,  as  before,  assumed  to  be 


( x -S  i ) ' 


1 2oc 

9(x)  = P(x|N)  = 7==  e 


first  transform  the  above  probability  density  function 


P,  (0  = 


dr  + f- 


where  ranges  from  0 to  1/72 no2  . 

From  the  symmetry  of  c w.r.t.  S] , it  can  be  shown  that 

sTxi('t)  = X2^;T^“S1  ’ 


and  Equation  (83)  becomes 


fxl('  T) 


9lx)dx  + ! 

J x2(  ■ y ) 


g(x)dx  = 


1 9 2 a 
==?  e 2o  dx  = 2 


X2('T)-S1 
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We  have 


x2(  j)  - si  " si  ' xl('T^  = Zu/2° 


(86) 


where  z is  defined  in  Equation  (22),  or 


! X i ( C T ) - S1 j = za/2o  for  i =1 ,2 


(87) 


Corresponding  to  this 


ZN  = [-za/2°  + S1  » Za/2°  + Sl^ 


/2‘ 


(88) 


and 


*/2 


1 " 2 

f T = *7=  e 

T 7^7 


(89) 


Comparing  this  with  Equation  (25),  we  see  that  the  boundaries 
are  no  longer  dependent  on  S2,  which  is  not  known  in  our  problem 
anyway. 


Two  Element  Case 


If  there  are  two  elements  and  S2  in  N,  the  problem  be- 
omp>;  more  complicated.  Now  becomes 


q(x)  = P'  P(x|S1)  + P2  P(x ! S2) 


(90) 


where 


P]  A P ( S-,  | N) 


P2  A P(S2|N)  = 1 - p-|  as  defined  before. 


(91) 

(92) 


A 


I iqure  I1),  /im  is  a reqion  cover inq  both  sides  of 
S|  symmetrically. 


Aqain,  assume  the  noise  added  to  the  siqna 
/('co  mean . I lien 


'l(x)  i',  7 ' e ■"  f p2  ■ . 

J?  no1 

By  transform inq  t.lie  above  into  • ho  domain  and  employ inq 
ami'  technique  as  before,  one  nets 


| 1 I'  l:Mi.  ,i  . (94 

! ()  ’ 


1 to  be  Gaussian 


(x-S. 


(9d 


I iqure  ?()  I wo  element  case. 
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And  ',y  can  therefore  be  solved  if  P (f.)  can  be  obtained  by 
Equation  (81).  The  points  x^*,  x2*,  x-^*,  x4*  (Figure  20)  corres- 
ponding to  f.j  are  then  obtained  by  substituting  r,  back  to  f=g(x). 
Note  that  analytically,  P (0  is  difficult  to  obtain  because  the 
points  for  which  g(x)  is  constant  are  difficult  to  compute 
for  Equation  (93),  and  even  more  so  the  derivatives  at  these 
points.  Nevertheless,  it  was  carried  out  and  the  results  listed 
in  Table  1.  The  new  region  Z^,  in  general,  is  the  union  of  two 
disjoint  zones  [x-|*,x2*]  and  [x3*,' 4*]  since  .\=g(x)  is  a multi- 
valued function.  Some  results  are  shown  in  Table  1 for  p-|=p2=l/2, 
i=0.05  and  S^'O. 


TABLE  1 

Two  Element  Case  for  the  Mi sclassification 
Probability  a=0.05,  S-]=0 


S-j/o 

S2/U 

X]  */o 

x2*/o 

x3*/o 

1 

x4*/o  , 

0 

1 

-1.68 

— 

— 

2.68 

0 

2 

-1.645 

— 

— 

3.645 

0 

5 

-1 .927 

1.96 

3.04 

6.927 

0 

10 

-1.96 

1.96 

8.04 

11.96 

0 

50 

-1.96 

1.96 

48.04 

51.96 

I In  the  table,  is  set  to  be  zero  to  simplify  the  computa- 

tion. This  does  not  affect  the  generality  of  the  results  since 

it  is  considered  as  a reference  point;  consequently,  the  numbers 
S2,  xj*,  x2*  — in  the  table  are  actually  S2-S] , x^-S-j,  x2*-S] 
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— etc.  The  signal  to  noise  standard  deviation  (o)  ratios  are 
associated  with  the  SNR  of  the  system.  The  interval  [x_^*,x^*] 
is  a symmetrical  reflection  of  [x-j*^*]  with  respect  to  the  mid- 
point of  and  S^.  For  the  threshold  line  only 

intersects  f=g(x)  twice  when  «=0.05;  therefore  x^.x^*  do  not 
exist  in  that  case.  Note  that  under  these  conditions,  j S - x ^ , 
where  j=2(i-l)  or  2(i-l)+l,  is  very  close  to  zero.  As  IS^-S^j 
becomes  larger  in  terms  of  a,  the  size  of  the  subregion  surround- 
ing S.,  i.e.,  |S.-x.*l  becomes  closer  to  az  /0,  the  same  as  that 
of  Z,.  found  in  the  last  example.  This  is  so  because  when  all  the 

IN 

subclasses  are  equally  weighted,  the  probability  density  contri- 
buted from  other  subclasses  around  each  element  is  small  as  long 
as  the  distances  between  elements  are  large;  consequently,  the 
threshold  boundaries  around  each  element  are  close  to  those  of 
single  element  systems.  Thus,  under  the  same  constraint  of  a, 
the  sizes  and  shapes  of  the  subregions  of  are  like  those  of 
Z^  generated  by  single  elements.  This  indicates  that  if  the  noise 
free  signals  of  each  element  in  the  known  class  are  far  apart  in 
terms  of  a,  Z^  will  be  a joint  region  of  several  individual  ones 
which  are  obtained  as  if  there  were  only  one  element  in  the  list- 
ed class,  provided  that  each  element  is  equally  weighted.  This 
conclusion  holds  no  matter  how  many  subclasses  there  are  in  the 
listed  class.  We  can  approximate  Z^  by  combining  the  subregions 
associated  with  each  subclass,  obtained  as  if  it  were  the  only 
one  in  the  listed  class,  and  reduce  the  complicated  computation 
substantial ly. 


In  the  last  example,  if  p^p,,,  the  function  \=g(x)  will  no 
longer  be  symmetric  with  respect  to  the  midpoint  of  (S^S^)  name- 
ly, (S,+S.,)/2  (Figure  21).  Still,  we  can  transform  the  original 
function  f.=g(x)  into  the  domain  and  compute  by  Equation  (94) 
The  required  region  is  again  obtained  by  substituting  f.-j.  back 
into  Equation  (94)  by  setting  =g(x*).  If  there  are  two  sub- 
regions  of  Z^,  each  one,  associated  with  its  noise  free  signal 
S-,  will  be  different  in  size  and  shape.  The  two  subregions  will 
merge  into  one  as  t decreases. 


Another  common  situation  similar  to  this  occurs  when  the 
amount  of  noise  added  to  each  signal  is  not  the  same.  For  in- 
stance, the  noise  injected  to  each  signal  is  proportional  to  its 
signal  amplitude  in  some  systems.  This  multiplicative  noise  will 
also  result  in  different  sizes  and  shapes  of  Z^  for  different 
subregions  than  those  described  in  the  above  example. 

In  case  that  .«  is  large,  very  often  some  of  the  noise  free 
points  originating  from  the  subclasses  of  the  listed  class  are 
not  contained  in  Z„  at  all  (Figure  22). 


00  -* Z x ZN  *+•  z x *+*-Z  N'vt*" Z x — 

I lyure  22.  1 N does  not  consists  of  the  signal 

points  S]  and  S4. 

This  is  most  likely  due  to  the  following: 

(1)  The  a priori  probabilities  of  some  subclasses  in 


the  listed  class  are  very  low  as  compared  with  those 
of  others.  When  u is  large,  consists  of  only  the 
highly  weighted  regions,  leaving  out  the  signal 
points  of  low  densities.  Consequently,  they  are 

excluded  from  Z... 

N 

[?.)  The  noises  added  to  some  signal  points  are  larger 
than  those  added  to  others,  making  the  probability 
contributions  from  these  points  smaller  than  those 
from  others.  For  the  noise  distribution  being 
Gaussian  or  decreasingly  unimodal , a classifier  is 
therefore  more  likely  to  exclude  some  signal  points 
corrupted  by  larger  noise  from  Z^  when  the  detection 
probability  1 -a  is  not  very  high. 
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C . Comparisons  of  the  Two  Criteria 

The  above  results  can  be  used  to  demonstrate  the  differences 
between  the  classifier  devised  by  Neyman-Pearson  criterion  and 
that  by  the  minimum  volume  criterion. 

Both  criteria  are  aiming  at  the  same  objective  of  minimi- 
zing one  kind  of  error  while  fixing  the  other.  The  lack  of  a 
priori  information  about  one  of  the  classes  leads  one  to  intro- 
duce the  idea  of  minimizing  the  region  associated  with  the  known 
class,  which  is  really  the  only  class  we  know  and  can  do  something 
about.  A Neyman-Pearson  classifier  can  fix  either  of  the  two 
error  probabilities  and  minimize  the  other  while  the  classifier 
devised  by  the  proposed  criterion  can  only  minimize  the  specified 
region  and  fix  the  error  probability  associated  with  the  known 
class.  This  indicates  that  the  former  is  more  flexible  in  choos- 
ing a class  whose  associated  error  is  to  be  fixed.  However,  in 
almost  all  practical  situations,  a system  very  often  gives  a 
strong  preference  in  fixing  one  of  the  two  error  probabilities, 
thus  making  the  two  classifiers  almost  identical  in  terms  of  the 
flexibility  with  respect  to  this  kind  of  selection. 

Another  difference  between  the  two  classifiers  is  the  quan- 
tity to  be  minimized.  A Neyman-Pearson  classifier  minimizes  a 
type  two  error  probability  which  is  associated  with  the  exterior 
class  (see  Chapter  1 1 - C ) , while  the  classifier  devised  in  this 
work  minimizes  the  region  associated  with  the  exterior  class.  It 
can  be  shown  that  the  two  classifiers  are  identical  (see  the  ex- 
amples in  Chapter  II-D)  when  the  corrupting  noise  is  Gaussian  with 


I 
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equal  variances  in  all  dimensions,  the  noise  free  signal  of  the 
first  class  occupies  only  a single  point  in  the  observation  space, 
and  the  noise  free  signal  of  the  exterior  class  is  normally  and 
symmetrically  distributed  around  the  noise  free  signal  of  the 
learning  class.  Extending  this  to  the  limiting  case  of  a-**,  we 
can  assert  that  the  two  classifiers  are  identical  when  the  noise 
free  signal  of  the  second  class  is  uniformly  distributed  over 
the  whole  observation  space  and  the  noise  is  Gaussian  of  equal 
variance  in  every  dimension.  This  kind  of  "equal  distribution  of 
the  noise  free  signal"  of  the  second  class,  which  happens  to  be 
the  unknown  class  in  the  problem  defined  here,  is  an  implicit 
assumption  in  the  minimum  volume  criterion,  but  the  criterion  can 
be  generalized  by  weighing  the  observation  space  according  to  the 
distribution  of  noise  free  signal  of  the  second  class.  However, 
it  minimizes  the  region  instead  of  the  error  probability  since  the 
former  is  strongly  linked  with  the  latter.  Since  the  noise  free 
signal  of  the  unknown  class  is  uniformly  distributed,  it  is  a 
reasonable  approach  not  to  weight  any  region  more  heavily  than 
others.  In  other  words,  the  criterion  should  weight  equally  every 
point  in  the  space.  This  leads  to  the  insensitivity  of  the  pro- 
posed classifier  to  the  distribution  of  the  noise  free  signal  of 
the  second  class. 

To  illustrate  the  above  behavior,  an  example  is  given  be- 


low.  Suppose  there  are  two  classes.  Class  I and  Class  II.  There 
is  one  corresponding  noise  free  signal  for  each  class  in  a one- 
dimensional observation  space.  The  noise  free  signal  of  Class  1 


is  known  to  be  zero  and  that  of  Class  II  is  unknown.  The  noise 
added  to  each  class  is  Gaussian  with  zero  mean  and  variance  of 
one.  When  the  error  probability  of  identifying  the  known  class 
(Class  I)  as  the  unknown  class  is  fixed  to  be  0.05,  according  to 
the  proposed  criterion  the  region  associated  with  the  known 
class  Zfj  is  [-1.645,1.645]  and  the  probability  of  misclassi ty- 
ing the  unknown  class  as  the  known  class  is 

pe  erfc( I ,645-x)  *-  erfc( 1 .645+x)  (95) 

where  x is  the  noise  free  signal  of  the  unknown  class  and  the 
function  erfc(x)  is  defined  in  Equation  (2 I). 

The  above  is,  of  course,  a fictitious  one  because  we  do  not 
have  any  information  about  x,  much  less  the  error  probability 
associated  with  it.  Its  result  is  plotted  in  curve  A in  Figure 
23.  The  probability  of  error  decreases  as  the  absolute  value  of 
x increases.  This  is  quite  reasonable  since  the  classifier 
better  distinguishes  the  two  classes  as  their  responses  are  fur- 
ther separated. 

If  we  use  the  Neyman-Pearson  classifier  to  distinguish  the 
two  classes  and  the  probability  of  a type  one  error  is  again 
fixed  to  be  0.05,  the  threshold  is  found  to  be  either  1.96  or 
-1.96,  depending  on  the  position  of  x.  Since  we  do  not  know  x, 
the  value  1.96  is  arbitrarily  chosen  as  the  threshold  (as  if  x 
were  positive)  and  the  probability  of  misclassifying  the  unknown 
class  to  the  known  class  is  shown  in  Figure  23  as  curves  (when 
x is  negative)  and  B]  (when  x is  positive).  When  x is  positive. 
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Figure  23.  The  fictitious  performances  of  three  different 
classifiers. 

A.  The  proposed  classifier  without  any 
knowledge  about  the  second  class. 

Bt -B2 . The  Neyman-Pearson  classifier  without 
any  knowledge  about  the  second  class. 

C-B] . The  Neyman-Pearson  classifier  with  the 
knowledge  of  both  two  classes. 


the  Neyman-Pearson  classifier  here  yields  better  performance  than 
the  proposed  classifier.  This  is  so  because  in  this  range,  the 
Neyman-Pearson  classifier  fully  utilizes  the  available  informa- 
tion (both  x and  zero,  response  of  the  first  class)  while  the  pro- 
posed classifier  only  uses  the  information  of  one  of  the  two 
classes.  However,  when  x is  negative,  the  threshold  1.96  for 
the  Neyman-Pearson  classifier  is  actually  wrong.  Therefore,  the 
performance  deteriorates  drastically  as  -x  gets  large.  This  is 
also  compared  with  curve  C,  where  the  classifier  is  designed  with 
the  information  of  both  the  classes.  Note  that  the  Neyman-Pear- 
son classifier  becomes  identical  to  the  proposed  classifier  when 
the  noise  free  signal  of  Class  II  is  statistically  symmetrical 
distributed  about  zero.  This  can  be  shown  by  employing  the  formu- 
las developed  in  Chapter  II-D. 

Considering  the  sensitivity  of  the  overall  performance  to 
the  position  of  x,  it  is  apparent  that  the  proposed  classifier  is 
superior  to  that  of  the  Neyman-Pearson.  In  case  that  x is  known, 
the  latter  yields  the  best  performance.  Nevertheless,  the  pro- 
posed cliassifier  does  yield  a comparable  performance.  The  per- 
formance is  insensitive  to  x,  the  response  of  the  unknown  class. 

D . Approximation  and  Simulation 

So  far,  we  have  been  trying  to  find  a region  Z^.  specified 
by  the  proposed  criterion,  to  provide  decision  surfaces.  A close 
look  at  the  theorem  of  Chapter  III -B  shows  that  if  is  unique, 
it  is  not  necessary  to  find  Z^  in  order  to  make  a classification. 


DO 


Whenever  we  want  to  identify  an  observed  object,  all  we  need  is 
to  obtain  a f,,  corresponding  to  the  specified  a.  The  measurement 
x is  then  substituted  into  f,=g(x)  and  compared  with  • If  £ is 
larger  or  equal  to  the  test  vector  x is  identified  as  being 
in  and  otherwise,  in  Z^. 

This  process  demonstrates  that  the  implementation  of  the 
classifier  can  be  very  simple  (Figure  24)  no  matter  how  compli- 
cated Z^  is.  Instead  of  finding  the  required  region  Z^,  the  com- 
putation of  f.y  becomes  the  main  task  in  designing  the  classifier 
for  identifying  unknown  objects.  Therefore,  it  is  obvious  that 
the  complexity  of  the  technique  is  not  in  implementing  the  classi- 
fier but  in  computing  the  error  probability  and  the  threshold  £-[-• 


Figure  24.  A simple  way  of  implementation  makes  the 
classification  as  a threshold  test. 


The  few  examples  worked  out  in  the  last  few  sections  illu- 


strate, however,  that  analytically  computing  Cj  is  extremely 
difficult  for  most  cases.  Two  algorithms  are  devised  here  to 
alleviate  this  difficulty. 


b/ 

This  can  also  be  used  as  another  ndependent  way  of  identi- 
fying unknown  objects.  The  optimum  region  is  formed  by  con- 
structing hyperspheres  around  each  signal  point  and  anv  noisy 
response  falling  into  can  then  be  considered  as  one  of  the 
listed  objects. 

The  approximation  is  especially  good  when  it  applies  to  a 
higher  dimensional  space.  This  is  so  because  the  influence  from 
each  noise  free  point  is  smaller  in  a higher  dimensional  space. 

A hyperspherical  shell  of  a specified  thickness  at  a fixed  dis- 
tance from  the  noise  free  point  in  a higher  dimensional  space 

gets  less  probability  contribution  as  compared  with  that  in  a 
lower  dimensional  space,  when  both  of  them  are  in  the  same  noise 

environment.  For  instance,  suppose  the  contaminating  noise  in 
every  dimension  is  Gaussian  and  independent  of  one  another  with 
variance  and  zero  mean,  the  probability  contribution  to  a 
region  bounded  by  radii  0.1  and  0.2  from  the  origin  is  0.07886 
in  a one  dimensional  space,  0.01481  in  a two  dimensional  space, 

0.00183  in  a three  dimensional  space  and  so  on.  The  contribution 
to  any  spherical  volume  surrounding  a noise  free  point  lying  in 
this  region  (.l<r<.2)  is  further  reduced  by  the  higher  dimension- 
ality since  the  ratio  of  such  a spherical  volume  to  the  spherical 
shell  (.l<r<.2)  decreases  as  the  dimensionality  increases. 

With  the  assumption  of  additive  noise  to  each  noise  free 
signal  S^,  the  radius  of  each  hypersphere  is  assumed  to  be  the 
same  and  is  computed  for  each  designated  a.  The  result,  in  turn. 


can  be  used  to  obtain  the  threshold  fj.  Some  examples  employing 
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this  scheme  in  the  detection  of  unknown  aircraft  targets  are 
worked  out  in  Chapter  III-E. 

Diagrammatical ly,  the  scheme  is  easy  to  implement  and  gives 
good  approximation  at  a high  SNR  system.  The  computation  of  the 
Type  I error  probability,  an  integration  over  the  joint  regions 
of  several  hyperspheres,  subject  to  a specified  radius  r is  dif- 
ficult, however,  in  a high-dimensional  space.  A Monte  Carlo 
simulation  technique  is  employed  to  circumvent  this  later. 

The  second  scheme  is  to  employ  a Monte  Carlo  simulation 
directly  to  the  computation  of  Cj-  First,  a train  of  random  vec- 
tors nq  ,i)2,---n^  are  generated  according  to  the  distribution  of 
the  noise  added  to  each  signal.  Each  random  vector  is  added  to 
its  noise  free  signal  to  form  a test  vector 

xk  = $i  + nk  . (96) 

The  test  vector  xk  is  substituted  into  C=g(x)  to  obtain  a scalar 
Ck,  which  is  then  stored  into  the  data  bank  D, . The  number  of 
random  vectors  added  to  each  noise  free  signal  is  proportional 
to  its  corresponding  a priori  probability  P(Sq|N).  After  all  the 
transformed  Ck's  are  stored  into  D. , Ck‘s  are  then  lined  up  in  the 
order  of  magnitude  and  we  designate  the  [o-mjth  smallest  £ to  be 

.j,  where  m is  the  total  number  of  ck's  in  D and  [ ] is  the 

symbol  for  the  largest  integer  less  than  or  equal  to  its  argument. 

It  is  clear  that  £,y  splits  the  fk‘s  into  two  groups,  those 
less  than  £y  and  those  that  are  not.  Corresponding  to  this,  the 
r.  space  is  divided  into  two  sections  ZX=[0>?T)  and  ZN=[?T,,^]. 
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which  goes  to  zero  as  in  goes  to  inf  inity,  showing  that  the 
tom  out  rat  ion  around  the  mean  inn-eases  with  in. 

We  may  then  use  the  sample  mean  as  an  estimate  of  >,  denoting  it 
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(102) 


and  obtain 


(103) 


indicating  that  « is  unbiased. 


The  variance  of  the  estimate 


= >-U-u)2>  = ^ 

lit 


(104) 


The  relative  spread  is 


(105) 


This  represents  the  error  spread  of  a as  a function  of  a. 
Therefore,  if  we  like  to  have  the  estimate  to  be  in  error  within 
some  specific  range,  say  e,  we  can  just  have  the  number  of  trials 
larger  than  or  equal  to 


« - ’ 
c a 


(106) 


For  instance  if  >=10%,  >-<=0.05,  then  y=0.95  and  we  obtain 
from  Equation  (106)  that  m should  be  1900.  Any  estimate  using 
more  ban  1900  trials  will  yield  better  accuracy.  Even  for 
<=0.01 , it  only  requires  9900  random  vectors  to  have  a 90  accur- 
acy, which  makes  the  implementation  of  this  simulation  feasible 
in  terms  of  computer  time. 

The  value  \y  can  be  used  to  obtain  ZN.  Yet,  as  indicated 
before,  the  criterion  does  not  require  the  system  to  construct  ZN 


The  proposed  criterion  classifies  the  object  into  one  of 
the  two  catagories,  namely,  the  one  consisting  of  the  listed  class 
and  the  other  consisting  of  everything  unlisted.  The  region 
associated  with  the  former  constructed  by  the  criterion  actually 
provides  a way  of  clustering  different  listed  subclasses.  This 
is  especially  useful  in  a mul ti -feature  space  where  the  criterion 
yields  a way  of  getting  those  most  likely  together.  By  decreasing 
the  value  of  «,  we  expand  the  region  Zpj  and  lienee  have  some  sub- 
regions  associated  with  the  known  subclasses  merged  into  a new 
cluster,  reducing  the  number  of  subregions.  This  is  indeed  a 
generalized  single  linkage  hierarchical  clustering[4]  except  that 
it  operates  in  a multi-dimensional  space.  Figure  27  shows  an 
example  of  forming  clustering  regions  when  employing  four  points 
embedded  in  Gaussian  noise.  Each  point  is  from  one  of  the  four 
subclasses.  The  classes  C3  and  C4  are  grouped  to  form  a new 
region  associated  with  both  of  them  when  i=a-].  The  classes 
and  C2  are  then  grouped  into  another  new  one  as  u decreases  to 
another  value.  All  the  subregions  are  eventually  grouped  into 
one  single  region  as  a decreases  to  a specific  value  *t),  depend- 
ing on  the  distribution  of  S^'s.  This  illustrates  a method  of 
clustering  in  terms  of  "influence"  of  each  individual  subclass. 


J 
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E . Identifying  Unlisted  Aircraft 

- An  Application 

The  above  simulation  technique  is  applied  to  an  aircraft 
identification  problem.  It  has  been  shown[6,7]  that  a low  fre- 
quency method  can  effectively  identify  a large  variety  of  air- 
planes. The  features  used  are  the  electromagnetic  scattering 
returns  from  the  observed  objects.  Signal  amplitude  as  well  as 
phase  information  and  the  orthogonal  polarizations  of  each  scat- 
tering return  are  used  to  do  the  identification.  An  additive 
Gaussian  noise  with  zero  mean  is  assumed  to  be  added  to  each  noise 
free  signal.  A set  of  four  aircraft  (F104,  F4,  MIG19  and  MIG21) 
is  chosen  to  form  the  listed  class.  The  a priori  probabilities 
are  assumed  to  be  25 t for  each  class  of  aircraft.  The  scattering 
data  of  all  the  aircraft  were  computed  by  the  moment  method[14]  at 
Ohio  State  University  ElectroScience  Laboratory  and  the  features 
(frequencies)  are  chosen  to  optimize  a nearest  neighbor  classifi- 
cation of  the  four  aircraft.  Based  on  the  above  assumptions,  the 
probability  density  function  when  a known  object  is  present  is 


g(x) 


1 

(/271a2  )n 


where 


(107) 


x is  the  observed  signal  vector, 

is  the  noise  free  signal  vector  of  the  ith  aircraft, 
a is  the  noise  standard  deviation,  and 


n is  the  number  of  features  used. 


The  aircraft  is  assumed  to  be  ta<  ing  the  observer  nose  on. 
We  first  apply  the  hypersphere  approximation  to  this  problem. 

The  region  Z^j  is  constructed  by  the  hyperspheres  centered  at  each 
signal  point  with  radius  r.  The  object  is  ident  tied  as  one  of 
the  listed  objects  if  the  observed  vector  lies  in  Z^.  From  the 
previous  argument,  the  probability  contribution  from  ZN  should 
total  1-a.  For  «=5%,  the  total  probability  contribution  over 

the  region  Z^  is  therefore  0.95.  The  Monte  Carlo  simulation  is 
used  to  find  the  radius  for  three  cases  using  different  numbers 
of  features  and  the  results  are  shown  in  Table  2. 

TABLE  2 

The  Approximated  Radii  Obtained  by  the  Monte  Carlo 
Simulation  for  Four  Aircraft  Data, 

«=0.05,  a2=l 


n- 

D imens l on 

Radius  for  <=0.05 

Radius  for  <*=0.05 
when  there  is  only 
one  object  in  the 
known  class 

2 

2.136 

2.42 

4 

2.864 

3.08 

8 

3.813 

3.94 

* 


/t) 

Forty-thousand  tested  vectors  were  generated  each  time  to 
assure  the  accuracy  of  the  simulation.  The  first  row  in  the  table 
employs  both  components  of  a complex  signal  (the  amplitude  and 
phase  of  a scattering  return  constitute  a complex  signal)  for  a 
single  frequency,  horizontally  polarized  wave.  The  second  row 
uses  both  vertical  and  horizontal  polarizations  of  the  same  fre- 
quency signal.  The  third  row  uses  both  polarizations  of  two 
frequency  returns  simultaneously. 

The  second  column  in  the  table  lists  the  data  obtained  by 
the  simulation.  The  last  column  lists  the  radii  for  the  cases 
when  there  is  only  one  element  in  the  known  class.  This  happens 
to  be  the  same  as  the  distance  for  the  distribution  with  the 
cumulative  probability  being  95%.  It  is  obvious  from  the  table 
that  the  deviation  of  the  radius  from  that  of  the  single  element 
case  decreases  as  the  dimensionality  goes  up  (13%  in  the  two- 
dimensional  case  and  only  3%  in  the  eight-dimensional  case).  This 
confirms  our  argument  before  that  the  influence  of  the  subclasses 
on  one  another  decreases  as  the  dimensionality  increases.  At  the 
same  time,  it  also  demonstrates  that  the  single  element  radius  is 
an  increasingly  good  approximation  as  dimensionality  increases. 

The  implementation  of  the  above  is  also  very  simple.  A test 
vector  x whose  minimum  distance  to  any  of  the  four  signal  points 
is  less  than  or  equal  to  the  obtained  radius  can  be  considered 
from  the  known  class  (one  of  F104,  MIG19,  F4  and  MIG21)  and  vice 


It 


versa. 
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The  second  scheme  is  to  obtain  the  threshold  for  a prefixed 
a directly  from  the  Monte-Carlo  simulation.  This  was  carried  out 
over  the  similar  cases  and  the  results  are  listed  in  Table  3. 
Again,  10,000  test  vectors  were  generated  in  each  simulation.  The 
accompanying  CPU  time  in  the  table  is  the  computer  time  for  each 
simulation  needed  for  the  Datacraft  6400  at  The  Ohio  State  Univer- 
sity ElectroScience  Laboratory,  which  is  about  three  times  slower 
than  an  IBM  370/165.  For  the  most  complicated  case  here,  only 
604.39  ms  is  needed  to  finish  the  simulation.  The  accuracy,  from 
Equation  (105),  is  in  the  range  of  4.36%  of  the  correct  value  for 
« being  5%.  This  illustrates  that  the  method  is  effective  and 
efficient. 

An  interesting  experiment  was  carried  out  by  introducing 
four  other  objects  --  MIG25,  SR71,  B1  and  F 1 4 . chosen  to  represent 
a wide  range  of  different  shapes  and  sizes  of  aircraft.  The  pro- 
babilities of  these  new  objects  being  classified  to  the  listed 
class  by  the  proposed  classifier  (described  in  Figure  24)  were 
computed  and  the  results  were  tabulated  in  Tables  4 and  5 for  a 
two-dimensional  case.  Also  computed  in  the  tables  were  the  pro- 
babilities of  classifying  the  listed  objects  into  the  unlisted 
class  when  the  test  vector  originated  from  the  known  objects. 

Both  tables  list  the  probability  of  classifying  each  object  out 
of  the  eight  into  the  unlisted  class.  In  the  process,  the  test 
vector  is  substituted  into  Equation  (107)  and  the  resultant  scalar 
is  compared  with  the  threshold  t\j,  obtained  by  presetting  < to  be 
sequentially  0.05  and  0.10.  Ten  thousand  test  vectors  were 
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generated  for  each  case  and  the  noise  added  to  the  objects  was  again 
assumed  to  be  Gaussian  with  zero  mean  and  a set  to  be  equal  to  10 
per  cent  of  the  average  signal  of  the  responses  of  the  four  listed 
objects.  For  the  first  four  (listed)  aircraft,  the  probabilities 
of  classifying  them  into  the  unlisted  class  are  close  to  0.05  in 
the  first  rows  and  to  0.1  in  the  second  rows  of  the  table  because 
we  prefixed  a to  be  0.05  and  0.1  respectively.  The  probability  of 
classifying  an  unlisted  object  into  the  listed  class  is  zero  for 
each  of  the  unlisted  objects  in  Table  4,  indicating  that  the 
classifier  is  an  excellent  discriminator  in  this  case.  Note  that 
the  thresholds  Cj  change  almost  linearly  as  a changes  from  0.05  to 
0.10,  indicating  that  the  instability  discussed  in  Chapter  1 1 1 - C 
does  not  occur  in  this  case,  although  the  thresholds  Cy  are  small. 
This  is  a reasonable  result  since  the  rate  of  change  of  a Gaussian 
distribution  (the  function  g(x)  in  this  example)  is  always  pro- 
portional to  the  value  of  the  function  at  the  point  considered. 

This  eliminates  the  flatness  of  g(x)  over  any  regions  in  the  obser- 
vational space. 

When  a increases  to  twenty  percent  of  the  average  signal,  the 
same  conclusion  can  also  be  drawn  on  the  performances  (Table  5) 
except  the  probabilities  for  SR71  and  B1  being  classified  to  the 
listed  class  are  not  zero.  This  happens  because  (1)  the  responses 
of  SR71  and  B1  are  closer  to  those  of  the  listed  objects  (Table  6), 
and  (2)  covers  a larger  area  in  the  observation  space  when  o 
is  larger.  Also  when  a gets  larger  (i.e.,  the  probability  of 
misclassifying  a listed  object  into  the  unlisted  class  gets  larger). 


the  region  ZN  associated  with  the  listed  class  shrinks,  making 
the  probability  of  misclassifying  an  unlisted  object  into  the 
listed  class  smaller.  This  is  seen  by  comparing  the  first  row 
and  the  second  row  in  Table  5. 

Table  6 

The  Noise  Free  Responses  of  the  Eight  Aircraft  at  the 
Considered  Frequency  (24  MHz)  at  Nose-on  Aspect 


FI  04 

MIG19 

F4 

MIG21 

MIG25 

SR71 

B1 

F14 

-0.526 

-2.904 

-5.076 

2.687 

-3.723 

4.426 

17.864N 

V 3.244, 

-5.908, 

4.258, 

-5.283, 

-20.057, 

2.828, 

-9.145, 

0.363  ' 

Incidentally,  the  probability  of  classifying  no  object  (null 
class)  to  the  listed  class  was  also  computed  and  the  results  are 
zero  for  all  the  cases  considered  here. 

When  the  dimensionality  increases  to  four  and  higher,  the 
classifier  performs  even  better.  The  probability  of  misclassify- 
ing any  of  the  listed  objects  into  the  unlisted  class  becomes  0.05 
for  all  of  the  four  listed  objects  and  that  of  identifying  an 
unlisted  object  as  the  listed  class  is  zero  for  all  of  the  four 
unlisted  objects.  This  demonstrates  that  the  proposed  scheme  is 
indeed  a very  effective  one  even  when  applying  to  a quite  noisy 
environment. 


CHAPTER  IV 

COMPLETE  CLASSIFICATION 


A . Introduction 

As  described  in  the  first  chapter,  there  are  two  steps  in  a 
"complete"  classification  procedure.  One  is  to  decide  whether 
the  object  to  be  identified  is  in  the  list  of  the  known  objects. 
If  it  is  one  of  the  catalogued  objects,  in  the  next  step  a con- 
ventional scheme  is  then  employed  to  do  the  classification.  If 
not,  the  object  is  designated  to  be  a new  object  and  a learning 
process  is  employed  to  estimate  its  characteristics. 

This  chapter  attempts  to  show  how  this  complete  classifica- 
tion procedure  can  be  conducted.  The  influence  of  the  preclassi- 
fication on  the  final  classification  and  the  strategies  to  be 
used  are  investigated.  Some  related  problems  are  also  discussed. 


B.  The  Effect  of  Preclassification 

The  technique  developed  in  the  previous  two  chapters  in- 
volved the  separation  of  the  uncatalogued  class  from  the  cata- 
logued class.  For  the  convenience  of  the  following  discussion, 
this  step  will  be  called  "preclassification"  and  the  step  of 
classifying  the  observed  object  as  one  of  the  listed  objects  after 
the  preclassification,  "final  classification" , or  just  "classifi- 
cation". The  preclassif ication  approach  minimizes  the  region  of 
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the  catalogued  class  in  the  observation  space,  while  keeping  the 
probability  of  misclassifying  the  catalogued  class  to  a fixed 
value.  This  tends  to  maximize  the  probability  of  correctly  class- 
ifying an  uncatalogued  object  while  ensuring  that  the  probability 
of  misclassifying  a catalogued  target  is  below  a prespecified 
level.  We  have  shown  that  this  region  is  constructed  in  such  a 
way  that  no  information  about  the  uncatalogued  class  is  needed. 
This  enables  the  scheme  to  be  useful  in  a practical  situation 
where  usually  no  information  about  the  uncatalogued  class  is 
available. 

In  the  final  classification,  only  the  response  that  falls 
into  Z^,  the  region  associated  with  the  listed  class  defined  in 
the  last  chapter,  is  used.  Most  of  the  observation  space  will 
thus  be  excluded.  This  will  have  some  influence  on  the  final 
classification  after  the  observed  object  is  determined  to  be  in 
the  listed  class. 

In  Chapter  III -A , we  have  shown  that  Z ^ excludes  the  region 
where  the  probability  density  of  the  observed  vector  x is  small, 
compared  with  its  density  in  ZN.  This  indicates  that  the  pre- 
classification procedure  excludes  only  the  portion  where  the  mea- 
sured vectors  are  least  likely  to  occur  in  the  feature  space.  If 
a is  small  enough,  the  region  Z^  should  contain  all  the  regions 
which  are  significant  in  the  final  classi fication  process.  In 
this  case,  the  preclassification  has  very  little  effect  on  the 
final  classification.  As  a increases,  ZN  shrinks  and  the  impact 
of  the  preclassification  becomes  noticeable.  However,  the  regions 
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corresponding  to  the  lowest  probability  density  are  usually  far 
away  from  the  uncorrupted  signals  of  the  listed  objects,  where 
the  classification  is  more  likely  to  make  an  error.  Consequently, 
the  exclusion  of  these  regions  is  usually  not  detrimental  to  the 
overall  performance  of  the  complete  classification  process.  Yet, 
the  total  impact  of  preclassification  depends  on  the  data  distri- 
bution of  the  listed  objects  as  well  as  the  misclassification 
probability  <. 

To  demonstrate  the  influence  of  preclassification,  an  ex- 
ample, using  Bayes  approach  as  a means  of  final  classification, 
is  shown  here.  Consider  the  example  given  in  Figure  20.  Let  the 
distance  between  the  two  objects  be  2d,  a be  the  standard  devia- 
tion of  the  noise  added  to  each  signal,  and  the  probability  of 
misclassifying  any  of  the  listed  objects  (S^  or  S2  in  Figure  20) 
as  unlisted  be  1.  If  we  use  Bayes  classifier  to  do  the  classifi- 
cation of  the  two  objects  directly  (without  going  through  the  pre- 
classification), the  average  probability  of  misclassification  is 
shown  by  the  curve  designated  as  a=0  in  Figure  28.  If  we  first 
apply  the  preclassification  process  and  use  Bayes  classifier  to 
do  the  classification  after  determining  the  object  to  be  in  the 
listed  class,  the  results  are  shown  by  the  rest  of  the  curves  in 
Figure  28,  for  various  values  of  a.  As  mentioned  above,  at  the 
final  classification  the  feature  space  is  shrunk  to  since  only 
the  response  that  falls  into  is  used  for  the  Bayes  test.  There- 
fore the  probability  of  misclassification  for  the  final  classifica- 
tion is  a conditional  probability  obtained  from  dividing  the 


misclassi f ication  probability  by  the  probability  of  the  response 
falling  into  Z^.  When  a increases  the  average  probability  of  mis- 
classification  increases,  but  not  significantly.  All  the  curves 
for  different  «'s  converge  to  0.5  as  d approaches  zero.  They  all 
go  down  to  zero  as  d goes  up  to  infinity,  which  would  be  expected 
since  the  classi fication  error  probability  approaches  zero  when  . 
the  two  noise  free  signals  are  separated  by  an  infinite  distance. 


C . Identification  Among  the  Known  Objects 

We  demonstrate  here,  with  a practical  aircraft  identifica- 
tion problem,  how  the  two  step  classification  works.  As  shown  in 
Chapter  1 1 1 - E , the  electromagnetic  scattering  returns  from  the 
observed  object  are  used  as  the  test  features.  The  selection  of 
the  features  (e.g.,  frequencies  and  polarizations)[6,7]  should  be 
solely  dependent  on  the  data  distributions  of  the  catalogued  ob- 
jects since  they  are  the  only  information  available. 

In  the  case  that  the  probability  distributions  are  known 
the  optimum  decision  rule  is  that  of  Bayes,  where  the  misclassifi- 
cation  probability  is  minimized.  The  average  probability  of  error 
will  be 


P 


e 


max  ( PKP( x | CK ) )dx 
' Z K N 


where 

C|<  denotes  the  kth  object  in  the  catalogued  class, 
p«  is  the  a proiri  probability  of  C|^, 


(108) 


P(x|C^)  is  the  probability  distribution  of  the  test 
vector  x when  C«  is  present,  and 

Z is  the  feature  space  defined  before. 

The  integration  is  carried  over  the  whole  feature  space  by  pick- 
ing the  maximum  of  p^P ( x | ) among  all  K such  that  the  error  pro- 
babi 1 ity  is  minimized. 

In  many  cases  it  may  be  difficult  to  evaluate  Pg  analytic- 
ally. This  is  so  even  if  the  distribution  P(x|C^)  is  known 
exactly  since  it  is  difficult  to  analytically  describe  the  region 
which  gives  the  maximum  value.  Also  the  computation  of  Pe  becomes 
extremely  complicated  when  the  number  of  objects  is  large,  there- 
fore, a nearest  neighbor  (N.N.)  method  is  used  below. 

An  N.N.  rule  can  be  described  as  follows.  Given  training 
samples  tS^l  for  each  object  CK,  the  rule  is  to  classify  the 
tested  point  x as  a member  of  Cr  to  which  its  nearest  neighbor 


belongs,  i.e.. 


x. Cr,  if  | | x-S  • r | | = min  | |x-Si 


(109) 


where  Cp  is  one  of  the  known  classes  and  we  use  the  notation  for 
Euclidean  distance. 


i*-siK||  = [x-SiK]T[x-SiK]1/2 


(110) 


An  analytical  calculation  of  probability  of  error  for  the 
nearest  neighbor  classi f ication  is  also  very  involved  because  it 
would  require  the  integration  of  a multivariate  density  function 


over  extremely  complicated  boundaries.  For  this  reason,  the  errors 
are  obtained  by  Monte  Carlo  simulation. 

The  N.N.  classifier  has  a practical  advantage  over  Bayes 
classifier  in  that  the  difficulties  of  determining  the  boundaries 
of  the  integration  are  eliminated,  and  of  course,  it  is  nonpara- 
metric  and  consequently  is  perfectly  applicable  to  classification 
problems  where  the  statistics  of  the  noise  and  signals  are  not 
known.  Cover  and  Hart[15]  have  shown  that  in  the  large  sample 
case,  the  probability  of  error  for  the  N.N.  rule  is  bounded  above 
by  twice  the  Bayes  probability  of  error,  and  is  clearly  bounded 
below  by  the  Bayes  error.  For  most  practical  situations  these 
bounds  are  sufficiently  tight  to  indicate  the  merit  of  the  N.N. 
rule.  Therefore,  in  the  following  examples,  the  N.N.  rule  is  em- 
ployed as  the  final  classification  to  compute  the  overall  perform- 
ance of  the  classifier.  In  the  aircraft  identification  problem, 
the  training  samples  are  just  the  noise  free  responses  from  the 
aircraft  to  be  identified  and  the  test  samples  are  these  responses 
to  which  a Gaussian  noise  is  added.  As  stated  in  Chapter  1II-E, 
the  a priori  probabilities  of  the  aircraft  subclasses  in  the  cata- 
logued class  are  assumed  to  be  all  the  same.  Therefore,  the  pro- 
bability function  of  the  test  vector  x when  N is  true  is 
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where  M is  the  number  of  objects  in  the  catalogued  class  and  x. 


’i  ’ 


, n are  all  defined  as  those  in  Equation  (106), 


0 . Two  Step  Classification 

Let  the  listed  class  consist  of  five  American  airplanes, 
F104,  F4,  SR71,  B1  and  F14.  The  data  of  the  electromagnetic  scat- 
tering returns  were,  as  mentioned  before,  numerically  computed  at 
Ohio  State  University  ElectroScience  Laboratory[14].  The  optimum 
frequencies  were  selected  by  using  the  N.N.  rule  to  do  the  classi- 
fication. The  optimum  frequencies  were  found  to  be  24  MHz  for 
single  frequency  features,  and  20  MHz  and  24  MHz  for  two  frequency 
features. 

A preclassification  process  is  first  employed  to  test  whet- 
her an  observed  object  is  listed.  To  do  this,  we  obtain  the  pro- 
bability density  function  of  an  observed  vector  x when  a catalog- 
ued object  is  present.  This  is  g(x)  in  Equation  (111),  where 
g(x)  is  shown  to  be  a function  of  all  scattering  returns  as  well 
as  that  of  the  noise.  The  thresholds  sy's  are  therefore  computed 
in  terms  of  the  noise  standard  deviation  o and  listed  in  Table  7 
for  i=0.05. 

The  numbers  shown  in  the  table  under  the  "case"  column  are 
explained  in  Table  3 in  Chapter  III-E."  For  instance,  (T)  repre- 
sents the  system  utilizing  one  frequency,  horizontally  polarized 
electromagnetic  returns  and  so  on.  A represents  the  average 
amplitude  of  all  the  listed  objects. 

When  a is  greater  than  1//27,  which  is  what  we  have  in  all 
the  cases  here,  it  is  seen  that  as  o increases,  £ decreases  in 
the  table.  This  is  because  g(x)  in  Equation  (111)  goes  down  as 


TABLE  7 

The  Thresholds  Computed  for  u=0.05  at  Three  Different 
Noise  Levels.  The  Listed  Class  Consists  of  five 
American  Aircraft:  F4,  F104,  SR71,  B1  and  F14 


Case 

L._ _ 

Dimensions 

o=0. 1 xA 

0.2  xA 

0.3  xA 

ft® 

2 

6. 3054xl0-3 

2. 1445xl0~3 

1 . 381 4x 1 0~3  | 

tT@ 

2 

5. 0052x1  O'3 

1 .3613xl0"3 

6. 3608x1  O'4 

'T0 

1 

4 

5.01 33x 1 0-4 

3.133  xlO'5 

6.22  xlO'6 

1 

8 

5. 681 3x1 0”6 

2.21 92x1 0-8 

8. 6591 xlO-1 0 

o becomes  larger,  reducing  the  value  of  amplitude  of  the  proba- 
bility density  in  the  vicinity  of  its  maximum.  This  is  also  true 
when  the  number  of  features  n increases  in  Equation  (111). 

Once  the  i,]-'s  are  decided,  the  experiment  described  in 
Figure  24  in  Chapter  1 1 1 - D can  be  used  to  test  the  classification 
of  these  five  American  made  airplanes.  Three  other,  foreign  made, 
aircraft  (MIG19,  MIG21  and  MIG25)  are  added  to  the  preclassifica- 
tion test  and  the  results  are  shown  in  Table  8 for  o=0.1  xA  for 
case  (T)  . 

The  probabilities  of  misclassi fications  are  computed  by 
Monte  Carlo  simulation  and  the  result  for  each  aircraft  is  listed. 
The  average  probability  of  preclassification  error  for  the  five 
listed  aircraft  is  0.0477,  which  is  close  to  the  theoretical 
value  0.05  (since  we  set  u=0.05).  The  difference  is  caused  by 
computational  error  and  is  so  small  that  it  can  be  considered  in 
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agreement  with  the  design  value  0.05.  The  classification  by 
using  the  N.N.  rule,  after  the  object  is  assigned  to  be  in  the 
listed  class,  is  also  computed  and  the  errors  are  negligible  when 
the  noise  is  ten  percent  of  the  average  amplitude  return. 

Table  9 shows  the  probabilities  of  misclassification  when 
.i  equals  twenty  percent  of  the  average  amplitude  returns.  The 
overall  performance  at  the  preclassification  stage  is  still  very 
good,  however,  the  classification  among  the  listed  objects  deteri- 
orates quite  significantly.  The  fact  that  the  listed  airplanes 
can  be  distinguished  from  the  unlisted  ones  while  they  cannot  be 
well  identified  among  themselves  stems  from  the  way  the  two  classi- 
fications are  set  up.  The  preclassification  process  classifies 
two  major  classes,  namely,  listed  and  unlisted.  The  errors  com- 
mitted in  the  detailed  classification  among  the  subclasses  of  the 
listed  class  are  not  considered  as  errors  in  the  preclassification 
process  unless  the  objects  are  identified  as  unlisted. 

When  the  noise  level  increases,  the  identification  of  the 
listed  objects  worsens  further,  although  it  is  still  relatively 
easy  to  distinguish  them  from  the  unlisted  ones  (Table  10). 

As  discussed  in  [6,7],  one  way  of  improving  the  overall  per- 
formance of  classification  among  the  known  objects  is  to  increase 
the  dimensionality  of  the  feature  vector.  This  was  done  by  em- 
ploying both  vertically  and  horizontally  polarized  radar  returns. 
The  results  are  shown  in  Tables  11-13  for  a=0.1  to  0.3  of  the 
average  return  of  the  listed  aircraft. 


TABLE  10 


Error  Probability 
of  Classification 


At  the  preclassification  stage,  the  overall  performance  is 
still  similar  to  the  previous  results.  Yet,  identification  among 
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the  listed  aircraft  improves  drastically  when  more  features  are 
used.  This  is  in  agreement  with  the  identification  results  of  an 
N.N.  classifier  without  going  through  the  preclassification  step, 
where  the  more  features  are  used,  the  better  a classifier  preforms. 
Note  that  the  increase  of  the  dimensionality  also  improves  the 
separability  between  the  unlisted  and  listed  aircraft,  although 
not  significantly  in  this  example. 

When  one  increases  the  dimensionality  to  eight,  i.e.,  utili- 
zing two  frequency  returns  simultaneously,  the  overall  performance 
is  further  improved.  The  separation  between  the  listed  and  un- 
listed classes  (for  the  three  aircraft  added  here)  becomes  very 
large.  The  error  probabilities  of  classifying  the  unlisted  air- 
craft as  the  listed  ones  approach  zero  even  when  o increases  to 
thirty  percent  of  the  average  return  (Tables  14-16). 

1 . Effect  of  the  Type  One  Error 
Probability  « in  the  Pre- 
classification Process 

The  threshold  f,y  is  constructed  by  presetting  type  one 
error  probability  a to  a fixed  value,  therefore  the  change  of  a 
has  some  influence  on  the  overall  performance.  Some  implications 
of  this  was  discussed  in  Section  C when  a Bayes  classifier  was 
employed  to  do  the  second  step  classifications.  In  this  section, 
we  demonstrate  the  effect  of  changing  a in  using  an  N.N.  classi- 
fier to  do  the  classification  of  the  catalogued  objects. 
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of  Classification 


Since  for  a=0.05,  the  other  misclassification  probability 
is  zero.  There  is  an  unbalance  between  the  two  types  of  error 
and  the  overall  error  is  larger  than  need  be.  Reducing  a will 
thus  tend  to  minimize  the  overall  error.  Its  value  is  decreased 
to  0.01  to  test  the  performance. 

Again,  the  thresholds  for  the  preclassi fication  are  comput- 
ed first.  The  five  American  airplanes  are  included  in  the  listed 
class.  Since  the  listed  aircraft  are  still  the  same,  the  opti- 
mum features  selected  before  are  unchanged  too.  The  thresholds 
for  five  different  cases  at  three  noise  levels  are  computed  and 
listed  in  Table  17.  Since  a is  five  times  smaller  than  before, 
according  to  Equation  (106),  the  number  of  tested  vectors  requir- 
ed to  yield  the  same  computational  accuracy  should  increase 
almost  five  fold,  implying  that  if  the  same  number  of  random  vec- 
tors are  used  in  the  computation  of  t, 's,  the  computation  error 
will  increase.  Still,  if  one  uses  10,000  random  vector,  the  de- 
viation of  the  classification  error  will  be  kept  to  less  than 
’O'.;,  of  the  actual  value. 

Again,  when  only  using  the  data  of  the  horizontally  polar- 
ized wave  at  the  frequency  of  24  MHz,  the  classifier  performs 
quite  satisfactori ly  even  when  « is  decreased  to  0.01  and  o 
equals  ten  percent  of  the  average  response  of  the  listed  air- 
craft. The  average  error  probability  of  the  preclassification 
becomes  0.00804,  which  is  close  to  the  specified  value  0.01  (Table 
18).  The  misclassi fication  among  the  listed  aircraft  at  the 
second  step  classification  is  still  kept  as  low  as  it  is  at 
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TABLE  17 

The  Thresholds  Computed  for  ..  0.01  at  Three  Different 
Noise  Levels.  The  Listed  Class  Consists  of  Five 
American  Aircraft:  F4,  F104,  SR 7 1 , B1  and  114 


1 Lase  Dimensions 

»'  ~0. 1 xA 

0.2  xA 

0.3  xA 

T O)  2 

1 .2504xl0*3 

5. 3065x1  O'4 

3. 1 53x 10"4 

f H © f 2 

I €t(D  1 4~ 

9.982xl0"4 

2. 745xl0”4 

1 . 510xl0"4~ 

8.524xl0*5 

0. 533x 1 0“ 5 

0. 1 08x 1 0-5 

j >.T®  ' « 1 * 4.382x10-7 

l ' i i 

1 .712x10*9 

6.679xl0*n 

i 0.05.  However,  since  i decreases,  the  region  in  the  feature 
space  expands,  increasing  the  tendency  of  identifying  an  unlisted 
object  as  listed.  Although  this  is  not  evident  in  Tables  18  and 

19  where  the  noise  levels  are  low,  it  shows  up  in  Table  20  where 

the  noise  level  increases  to  thirty  percent  of  the  average  re- 
sponse of  the  listed  aircraft.  Of  course,  this  kind  of  error 
probability  depends  on  the  response  of  the  unlisted  objects.  The 
affect  of  on  the  overall  performance  is:  the  smaller  a is,  the 
more  probable  it  is  that  the  preclassifier  will  identify  an  un- 

listed object  as  one  of  the  listed  objects.  Note  that  the  influ- 
ence of  t on  the  performance  of  the  second  step  classification  is 
negligibly  small.  The  results  in  Tables  18-20  do  not  change  very 
much  from  those  in  Tables  9-11.  A general  conclusion  cannot  be 
drawn  based  on  these  because  the  differences  are  so  small.  They 

are  probably  contributed  by  the  numerical  computation  error. 
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Error  Probability 
of  Classification 


Tables  21-23  list  some  of  the  results  when  using  the  verti- 
cally polarized  returns  at  the  same  frequency.  The  data  distribu- 
tion of  the  noise  free  responses  of  the  listed  objects  are  not  the 
same  as  that  in  the  previous  case.  Hence  the  classification  among 
the  known  objects  changes  quite  a bit.  Nevertheless,  the  overall 
performance  in  separating  the  listed  class  from  the  unlisted  class 
is  still  very  good. 

The  error  probability  of  identifying  the  unlisted  class 
as  the  listed  class  is  on  the  average  less  than  ten  percent 
even  when  the  standard  noise  deviation  o is  thirty  percent  of  the 
average  response.  The  error  probability  for  the  opposite  direc- 
tion of  identification  is  close  to  0.01,  in  agreement  with  the 
specified  value. 

Finally,  we  show  that,  for  the  special  case  of  using  both 
polarized  wave  returns  at  one  frequency  the  results  are  good 
enough  for  the  preclassified  error  even  under  the  constraint  of 
>t= . 01 . The  identification  among  the  known  objects  also  performs 
very  satisfactorily  when  utilizing  the  N.N.  rule  to  do  the  second 
step  classification.  The  results  are  shown  in  Tables  24-26. 

F.  Groupings  and  Strategies 

The  scheme  developed  to  do  the  classification  between  the 
listed  and  unlisted  classes  can  serve  not  only  as  a means  of  pre- 
classification, but  also  an  intermediate  step  in  the  complete 
classification.  This  is  especially  beneficial  in  a classifica- 
tion involving  a large  number  of  classes. 


TABLE 24 

Probabilities  of  Misclassification  at  i=0. 1 xA  When  Using  Four  Features  (Vertical  and 
Horizontal  Polarizations).  The  Listed  Class  Consists  of  Five  American  Aircraft 
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Considering  a situation  where  an  object  is  to  be  identified 
as  one  of  many  possible  objects  in  the  known  list,  a conventional 
method  will  require  the  response  of  this  object  to  be  compared 
with  those  of  known  objects  in  some  predetermined  way.  For  in- 
stance, in  an  N.N.  classifier,  the  Euclidean  distances  of  this 
response  to  the  noise  free  points  of  all  the  listed  objects  are 
computed  and  the  minimum  one  of  these  computed  distances  is  then 
used  to  classify  the  observed  object.  The  computation  and  the 
comparisons  of  these  distances  usually  consume  a lot  of  time  if 
the  number  of  the  listed  objects  is  large.  In  general,  the  time 
for  picking  a minimum  (or  maximum)  value  is  proportional  to  n, 
the  number  of  items  from  which  the  choice  is  made.  A conventional 
scheme  becomes  very  slow  and  inefficient  due  to  this.  Moreover, 
the  classifier  becomes  inevitably  complicated  when  the  number  of 
the  objects  is  large. 

One  way  of  tackling  the  above  problem  is  to  group  all  the 
listed  objects  into  several  subgroups  and  use  the  preclassifica- 
tion scheme  developed  in  this  work  as  an  intermediate  step  of 
classification.  One  of  the  subgroups  can  be  considered  as  the 
listed  class  and  the  scheme  is  employed  to  do  the  preclassifica- 
tion. Once  it  is  decided  that  the  object  is  in  this  subgroup,  a 
conventional  scheme  can  be  used  to  identify  it  as  one  of  the  ob- 
jects in  this  subgroup.  If  not,  the  next  subgroup  can  be  con- 
sidered as  the  listed  class  and  the  same  process  is  applied  until 
the  observed  object  is  identified.  Of  course,  in  the  whole  pro- 
cedure the  objects  with  higher  a priori  probabilities  should  be 


subgrouped  and  used  as  the  first  listed  class.  By  doing  so,  the 
number  of  tests  would  be  reduced  since  we  test  the  most  likely 
occuring  objects  first. 

The  procedure  would  at  least  yield  the  following  advantages 

(1)  The  basic  classifiers  can  be  greatly  simplified  since 
the  number  of  the  objects  in  each  identification  is 
smal 1 . 

(2)  The  objects  with  similar  responses  can  be  grouped 
into  the  same  subgroup  in  the  first  classification 
such  that  the  classification  error  probabi 1 i ties 
can  be  made  small.  Incidental ly,  the  clustering 
process  described  in  Chapter  III-D  can  serve  as 

a grouping  method. 

(3)  The  classifier  identifies  each  subgroup  in  sequence, 
hence  the  objects  considered  most  important  (or  with 
highest  a priori  probabilities)  can  be  subgrouped 
and  used  as  the  first  listed  class.  In  this  way, 
the  objects  can  be  grouped  according  to  the  import- 
ance of  the  identification  of  each  one,  and  the 
classifier  carries  out  the  classification  in  a 
specified  order  of  priorities. 

Note  that  the  preclassifier  developed  by  the  proposed 
criterion  only  compares  the  value  g(x)  with  the  designated 
threshold  £7,  the  process  is  much  simpler  than  that  needed  for 
an  ordinary  classification  of  several  objects.  Therefore,  the 
preclassification  is  much  less  time  consuming  as  compared  to  a 
conventional  classifier. 


However,  the  preclassification  process  yields  some  classi- 
fication error  in  each  step  of  identification.  For  successive 
identifications  of  several  subqroups  this  process  will  definitely 
worsen  the  overall  performance,  the  extent  to  which  this  will 
happen  depends  on  the  prefixed  error  probability  u as  well  as  the 
data  distribution.  Since  this  is  also  related  to  the  order  of 
the  subqroups  chosen,  an  optimum  strategy  for  this  system  has  to 
be  defined.  Further  study  is  necessary  to  determing  the  stragegy 


A technique  has  been  developed  to  discriminate  listed  ob- 
jects from  unlisted  ones.  It  is  based  on  the  principle  of  mini- 
' mi z i ng  the  probability  of  error  in  an  identification  process. 

Since  no  information  reyarding  the  unlisted  objects  is  available, 
instead  of  minimizing  the  overall  probability  of  misclassifica- 
tion  the  method  prefixes  the  probability  of  misclassifying  a 
listed  object  as  an  unlisted  one  and  minimizes  the  region  associ- 
ated with  the  listed  class  in  the  feature  space.  This  minimizes 
the  likelihood  of  misclassifying  unlisted  objects  as  listed  ones. 

It  was  proved  that  the  devised  classifier  could  be  imple- 
mented as  a threshold  test.  The  employment  of  the  latter  greatly 
simplifies  the  design  of  the  classifier.  The  classifier  was 
applied  to  an  aircraft  identification  problem.  It  was  shown  that 
the  error  probability  of  misclassifying  catalogued  targets  as 
uncatalogued  and  vice  versa  can  be  made  very  small,  while  keeping 
a high  probability  of  correct  identification  when  the  presence 
of  a listed  object  is  detected.  The  misclassification  probability 
for  a specific  case  of  three  unlisted  objects  and  five  listed 
ones  was  computed  and  was  found  to  approach  zero  when  the  number 
of  the  features  used  was  as  low  as  four.  The  additional  step  of 
discriminating  listed  objects  from  unlisted  ones  produced  very 
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little  degradation  of  the  overall  classification  performance. 

The  overal 1 misclassif ication  probability  for  all  cases  consider- 
ed was  changed  less  than  five  percent.  The  implementation  of  the 
developed  scheme  was  shown  to  be  simple  and  efficient. 

The  technique  does  not  need  to  utilize  any  information  about 
the  unlisted  class  to  carry  out  the  classification.  However,  some 
a priori  knowledge  of  the  unlisted  class  is  sometimes  available, 
lor  instance,  the  responses  of  the  unlisted  objects  might  well  be 
confined  to  a restricted  region  of  the  observation  space.  We  can 
use  the  technique  developed  in  Chapter  1 1 - D to  tackle  this  kind 
of  a problem. 

Some  information  on  unlisted  objects  can  be  obtained  from 
the  observed  response.  For  example,  once  the  observed  object  is 
determined  to  be  a new  one,  a learning  process  is  employed  to 
estimate  the  characteristic  of  this  new  object,  which  can  be  used 
as  the  a priori  information  for  the  classification  process.  This 
would  lead  to  a modification  of  the  minimum  volume  criterion. 

Since  the  response  of  any  unlisted  object  was  assumed  to  be 
unknown  in  this  study,  no  further  investigation  was  conducted 
along  this  line. 

Another  problem  that  occurs  frequently  is  that  the  para- 
meters of  the  listed  class  are  not  known.  This  turns  out  to  be 
a nonparametric  classification  problem  of  identifying  the  un- 
listed class  as  distinct  from  the  listed  class.  No  attempt  has 
been  made  to  deal  with  this  problem,  but  it  should  be  investigated. 


A procedure  for  identifying  a subclass  of  a large  number 
of  objects  was  discussed.  The  classification  utilizing  the  de- 
vised classifier  is  essentially  an  elimination  process.  The 
subgroup  bearing  the  highest  a priori  probabilities  is  tested 
first  and  eliminated  from  the  list  if  it  is  decided  that  the 
observed  object  does  not  belong  to  this  subgroup.  The  detailed 
classification  procedure  depends  on  the  cost  function  assigned 
to  the  number  of  tests  needed,  the  probability  of  misclassifi- 
cation  and  the  complexity  of  the  classifier[16].  This  approach, 
which  was  outlined  in  Chapter  IV,  requires  more  thorough  study. 
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